7.2 Confidence Interval Examples

7.2.1 Proportion Outcome

Let’s return to the flight data and estimate the proportion of afternoon flights based on a sample of 100 flights from NYC.

First, the classical 95% confidence interval can be constructed using the theory of the Binomial Model (Do the 3 conditions hold? Is n large enough for it to look Normal?)

flights_samp %>% 
  summarize(prop = count(day_hour)/100) %>%
  mutate(SE = sqrt(prop*(1-prop)/100)) %>%
  mutate(lb = prop - 2*SE, ub = prop + 2*SE)

## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

## # A tibble: 1 x 4
##    prop     SE    lb    ub
##   <dbl>  <dbl> <dbl> <dbl>
## 1  0.53 0.0499 0.430 0.630

Or we could bootstrap and get our confidence interval that way.

alpha <- 0.05

boot_data <- mosaic::do(1000)*( 
    flights_samp %>% # Start with the SAMPLE (not the FULL POPULATION)
      sample_frac(replace = TRUE) %>% # Generate by resampling with replacement
      summarize(prop = count(day_hour)/100) # Calculate statistics
)

boot_data %>%
  summarize(lower = quantile(prop, alpha/2),
    upper = quantile(prop, 1-alpha/2))

##   lower upper
## 1  0.44  0.63

Our confidence interval gives a sense of the true proportion of flights departed NYC in the afternoon, keeping in mind that this sample could be one of the unlucky samples (the 5%) that have intervals that don’t contain the true value.

7.2.2 Mean and then Median

Perhaps you really care about the arrival delay time because you have somewhere important you need to be when you take flights out of NYC. Let’s estimate the mean arrival delay based on a sample of 100 flights from NYC.

First off, let’s create a classical confidence interval. Since our sample size is relatively large, we can use the Normal model (instead of William Gosset’s work). We use the sample standard deviation and plug into the SE formula.

flights_samp %>% 
  summarize(mean = mean(arr_delay), s = sd(arr_delay)) %>%
  mutate(SE = s/sqrt(100)) %>%
  mutate(lb = mean - 2*SE, ub = mean + 2*SE)

## # A tibble: 1 x 5
##    mean     s    SE     lb    ub
##   <dbl> <dbl> <dbl>  <dbl> <dbl>
## 1  6.36  33.9  3.39 -0.422  13.1

alpha <- 0.05

boot_data <- mosaic::do(1000)*( 
    flights_samp %>% # Start with the SAMPLE (not the FULL POPULATION)
      sample_frac(replace = TRUE) %>% # Generate by resampling with replacement
      summarize(means = mean(arr_delay),medians = median(arr_delay)) # Calculate statistics
)

boot_data %>%
  summarize(lower = quantile(means, alpha/2),
    upper = quantile(means, 1-alpha/2))

## # A tibble: 1 x 2
##   lower upper
##   <dbl> <dbl>
## 1 0.230  13.4

Our confidence interval gives potential values for of the true mean arrival delay for flights that departed NYC, keeping in mind that this sample could be one of the unlucky samples (the 5%) that have intervals that don’t contain the true value. Also remember that the mean is sensitive to outliers…Let’s consider the median.

We get a slightly different story if we are interested in the middle number versus the average. But notice, we aren’t using a classical CI here because the sampling distribution of a median is not necessarily Normal.

boot_data %>%
  summarize(lower = quantile(medians, alpha/2),
    upper = quantile(medians, 1-alpha/2))

## # A tibble: 1 x 2
##   lower upper
##   <dbl> <dbl>
## 1  -8.5   5.5

7.2.3 Logistic Regression Model

Are the same relative numbers of morning flights in the winter and the summer? Let’s fit a logistic regression model and see what our sample says.

flights_samp$afternoon = flights_samp$day_hour == 'afternoon'

glm.afternoon <- glm(afternoon ~ season, data = flights_samp, family = 'binomial')
summary(glm.afternoon)

## 
## Call:
## glm(formula = afternoon ~ season, family = "binomial", data = flights_samp)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.332  -1.126   1.030   1.030   1.230  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -0.1226     0.2863  -0.428    0.668
## seasonwinter   0.4793     0.4036   1.188    0.235
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 138.27  on 99  degrees of freedom
## Residual deviance: 136.85  on 98  degrees of freedom
## AIC: 140.85
## 
## Number of Fisher Scoring iterations: 4

The output for the model gives standard errors for the slopes, so we can create the classical confidence intervals directly from the output,

confint(glm.afternoon)

## Waiting for profiling to be done...

##                   2.5 %    97.5 %
## (Intercept)  -0.6905940 0.4389059
## seasonwinter -0.3081988 1.2791414

Or with bootstrapping,

boot_data <- mosaic::do(1000)*( 
    flights_samp %>% # Start with the SAMPLE (not the FULL POPULATION)
      sample_frac(replace = TRUE) %>% # Generate by resampling with replacement
      glm(afternoon ~ season, data = ., family = 'binomial') # Calculate statistics
)

boot_data %>%
  summarize(lower = quantile(seasonwinter, alpha/2),
    upper = quantile(seasonwinter, 1-alpha/2))

##        lower    upper
## 1 -0.3503645 1.367213

Knowing that the sample is random, the interval estimate for the logistic regression slope is given by the confidence interval. But for logistic regression, we exponentiate the slopes to get an more interpretable value, the odds ratio. Here, we are comparing the odds of having a flight in the afternoon between winter months (numerator) and summer months (denominator). Is 1 in the interval? If so, what does that tell you?

exp(confint(glm.afternoon))

## Waiting for profiling to be done...

##                  2.5 %   97.5 %
## (Intercept)  0.5012782 1.551009
## seasonwinter 0.7347692 3.593553

boot_data %>%
  summarize(lower = exp(quantile(seasonwinter, alpha/2)),
    upper = exp(quantile(seasonwinter, 1-alpha/2)))

##       lower    upper
## 1 0.7044313 3.924396

7.2.4 Linear Regression Model Slope (Categorical Variable)

Everything says that there are longer delays in winter. Is that actually true? Let’s fit a linear regression model to test it

lm.delay <- lm(arr_delay ~ season, data = flights_samp)
summary(lm.delay)

## 
## Call:
## lm(formula = arr_delay ~ season, data = flights_samp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.694 -21.694  -8.694  11.547 166.306 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)    6.6939     4.8687   1.375    0.172
## seasonwinter  -0.6547     6.8176  -0.096    0.924
## 
## Residual standard error: 34.08 on 98 degrees of freedom
## Multiple R-squared:  9.408e-05,	Adjusted R-squared:  -0.01011 
## F-statistic: 0.009221 on 1 and 98 DF,  p-value: 0.9237

The classical CI for the slope is given with by

confint(lm.delay)

##                   2.5 %   97.5 %
## (Intercept)   -2.967923 16.35568
## seasonwinter -14.183889 12.87457

or with bootstrapping,

boot_data <- mosaic::do(1000)*( 
    flights_samp %>% # Start with the SAMPLE (not the FULL POPULATION)
      sample_frac(replace = TRUE) %>% # Generate by resampling with replacement
      lm(arr_delay ~ season, data = .) # Calculate statistics
)

boot_data %>%
  summarize(lower = quantile(seasonwinter, alpha/2),
    upper = quantile(seasonwinter, 1-alpha/2))

##      lower  upper
## 1 -15.0511 11.794

The 95% confidence interval gives a sense of the difference of mean arrival delays of flights between winter and summer is given by the confidence interval. Is zero in the interval? If so, what does that tell you?

7.2.5 Linear Regression Model Slope (Quantitative Var)

How well can the departure delay predict the arrival delay? What is the effect of departing 1 more minute later? Does that correspond to 1 minute later in arrival on average? Let’s look at the estimated slope between departure and arrival delays for the sample of 100 flights from NYC.

lm.delay2 <- lm(arr_delay ~ dep_delay, data = flights_samp)
summary(lm.delay2)

## 
## Call:
## lm(formula = arr_delay ~ dep_delay, data = flights_samp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.396 -12.397  -2.904   9.588  60.529 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.58743    1.85522  -1.395    0.166    
## dep_delay    1.00420    0.06173  16.267   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.72 on 98 degrees of freedom
## Multiple R-squared:  0.7297,	Adjusted R-squared:  0.727 
## F-statistic: 264.6 on 1 and 98 DF,  p-value: < 2.2e-16

The classical CI for the slope is given with by

confint(lm.delay2)

##                  2.5 %   97.5 %
## (Intercept) -6.2690537 1.094198
## dep_delay    0.8816967 1.126705

or with bootstrapping,

boot_data <- mosaic::do(1000)*( 
    flights_samp %>% # Start with the SAMPLE (not the FULL POPULATION)
      sample_frac(replace = TRUE) %>% # Generate by resampling with replacement
      lm(arr_delay ~ dep_delay, data = .) # Calculate statistics
)

boot_data %>%
  summarize(lower = quantile(dep_delay, alpha/2),
    upper = quantile(dep_delay, 1-alpha/2))

##       lower    upper
## 1 0.9173677 1.181704

If the flight leaves an additional minute later, then we’d expect the arrival delay to be increased by a value in the interval above, on average.

7.2.6 Confidence Intervals for Prediction

Imagine we are on a plane, we left 15 minutes late, how late will arrive? Since we only have a sample of 100 flights, we are a bit unsure of our prediction.

A classical CI can give us an interval estimate of what the prediction should be (if we had data on all flights).

predict(lm.delay2, newdata = data.frame(dep_delay = 15), interval = 'confidence')

##        fit      lwr      upr
## 1 12.47558 8.881202 16.06996

This is taking into account how uncertain we are about our model prediction because our model is based on sample data rather than population data.

7.2.7 Prediction Intervals

We also know that every flight is different (different length, different weather conditions, etc), so the true arrival delay won’t be exactly what we predict.

So to get a better prediction for our arrival delay, we can account for the size of errors or residuals by creating a prediction interval. This interval will be much wider than the confidence interval because it takes into account how far the true values are from the prediction line.

predict(lm.delay2, newdata = data.frame(dep_delay = 15), interval = 'prediction')

##        fit       lwr      upr
## 1 12.47558 -22.86868 47.81985

7.2.8 Probability Theory vs. Bootstrapping

In the modern age, computing power allows us to perform boostrapping easily to create confidence intervals. Before computing was as powerful as it is today, scientists needed mathematical theory to provide simple formulas for confidence intervals.

If certain assumptions hold, the mathematical theory proves to be just as accurate and less computationally-intensive than bootstrapping. Many scientists using statistics right now learned the theory because when they learned statistics, computers were not powerful enough to handle techniques such as bootstrapping.

Why do we teach both the mathematical theory and bootstrapping? You will encounter both types of techniques in your fields, and you’ll need to have an understanding of what these techniques are to bridge the gap until statistical inference uses modern computational techniques more widely.