---
title: "Hypothesis testing details and practice (Notes)"
subtitle: "STAT 155"
author: "Your Name"
format:
  html:
    toc: true
    toc-depth: 2
    embed-resources: true
---


```{r setup}
#| include: false
knitr::opts_chunk$set(
  collapse = TRUE, 
  warning = FALSE,
  message = FALSE,
  error = TRUE,
  fig.height = 2.75, 
  fig.width = 4.25,
  fig.env = 'figure',
  fig.pos = 'h',
  fig.align = 'center')

# Use a color blind friendly color palette throughout doc
library(tidyverse)
cb_palette <- c( "#0072B2", "#D55E00", "black", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#CC79A7")
scale_colour_discrete <- function(...) scale_colour_manual(values = cb_palette, ...)
scale_fill_discrete   <- function(...) scale_fill_manual(values = cb_palette, ...)
theme_set(theme_bw())
```



## Warm-up {-}

**GOAL**

Understand the *population* model of hiking `time` (in hours) by the highest `elevation` of the hike (in 1000s of feet):
    
E(time | elevation) = $\beta_0$ + $\beta_1$ elevation

We can make *inferences* about this relationship using data on a sample of hikes in the `peaks` data:

```{r eval=TRUE}
# Load packages & data
library(tidyverse)
peaks <- read.csv("https://mac-stat.github.io/data/high_peaks.csv") %>% 
  mutate(elevation = elevation/1000)

# Plot the relationship
peaks %>% 
  ggplot(aes(x = elevation, y = time)) + 
  geom_point()

# Model the relationship
hike_model <- lm(time ~ elevation, data = peaks)
coef(summary(hike_model))

# Get CIs
confint(hike_model)
```


The *sample model* is

E(time | elevation) = $\hat{\beta_0}$ + $\hat{\beta_1}$ elevation = 11.21 - 0.13 elevation 



Thus:

- We **estimate** that, for every additional 1000 feet in elevation, the expected / average hiking time decreases by 0.13 hours (~8 minutes).

- We expect that this estimate might be off by / have an **error** of 1.18 hours per 1000 feet.

- We're **95% confident** that, in the *broader population of hikes*, an additional 1000ft in elevation is associated with anywhere from a 2.50 hour *decrease* to a 2.24 hour *increase* in expected / average hiking time. Thus we do NOT have *statistically significant* evidence that hiking time is associated with elevation (a change of 0 is in this interval, hence is a plausible value).



### Example 1: Starting a formal hypothesis test

$H_0$: hiking time is *not* associated with elevation    
$H_a$: hiking time *is* associated with elevation

Translate this to $\beta$ notation:

$H_0$: 

$H_a$: 

NOTE: We will evaluate these hypotheses using a 0.05 **significance level**.
It's important to choose our significance level *before* starting our test to avoid post-hoc "tweaks" that suit our narrative.



### Example 2: What would we expect under $H_0$?

Suppose $H_0$ were true, i.e. $\beta_1$ were 0.

By the Central Limit Theorem, we'd expect the sampling distribution of possible $\hat{\beta}_1$ sample estimates to be Normally distributed around 0 with the standard error obtained from our model summary table:

$$
\hat{\beta}_1 \sim N(\beta_1, SE(\hat{\beta}_1)^2)
\;\;\; \Rightarrow \;\;\;
\hat{\beta}_1 \sim N(0, 1.18^2)
$$

Based on the corresponding plot below, is *our* sample estimate of $\hat{\beta}_1 = -0.13$ consistent with $H_0$?

```{r fig.width = 8, eval=TRUE}
# IGNORE THE SYNTAX!!!
# Put OUR sample estimate here
est <- -0.13

# Put the corresponding standard error here
se <- 1.18

data.frame(x = 0 + c(-4:4)*se) %>% 
  mutate(y = dnorm(x, sd = se)) %>% 
  ggplot(aes(x = x)) +
  stat_function(fun = dnorm, args = list(mean = 0, sd = se)) +
  geom_segment(aes(x = x, xend = x, y = 0, yend = y), linetype = "dashed") + 
  scale_x_continuous(breaks = c(-4:4)*se) + 
  geom_vline(xintercept = est, color = "blue") +
  labs(y = "density", x = "possible estimates IF H0 were true") 
``` 


### Example 3: Test statistic

```{r eval=TRUE}
coef(summary(hike_model))
```


a. Calculate the test statistic for our hypothesis test.



b. Interpret the test statistic: Our sample estimate of $\beta_1$ is ___ [how many] standard errors ___ [above, below] the null value of ___. 


c. So is our sample data compatible with $H_0$?






### Example 4: p-value

a. Use the 68-95-99.7 Rule with the sketch from Example 2 to *approximate* the p-value:    
    - less than 0.003
    - between 0.003 and 0.05
    - between 0.05 and 0.32
    - bigger than 0.32


b. Report the *exact* p-value from the model `summary()`.    

```{r}
coef(summary(hike_model))
```



c. How can we interpret this p-value? Choose 1.    
    - Given our sample data, it's likely that hiking time is associated with elevation (i.e. that $H_a$ is true). $$P(H_a | data) = \text{prob of $H_a$ being true given our data}$$
    - Given our sample data, it's likely that hiking time is not associated with elevation (i.e. that $H_0$ is true). $$P(H_0 | data) = \text{prob of $H_0$ being true given our data}$$
    - If in fact there were no relationship between hiking time and elevation in the broader population of hikes (i.e. $H_0$ were true), it's likely that we'd have gotten our observed decrease of hiking time with elevation "by chance". $$P(data | H_0) = \text{prob of observing our data given $H_0$ is true}$$
    
d. So is our sample data compatible with $H_0$?




### Example 5: Conclusion

a. So, at the 0.05 significance level, what's our conclusion? Is there a **statistically significant** association between hiking time and elevation? 

b. Does this conclusion agree with the one we made using the confidence interval for $\beta_1$, (-2.50, 2.24)?

c. Does this conclusion agree with what we'd conclude from the confidence bands below?

```{r eval=TRUE}
peaks %>% 
  ggplot(aes(x = elevation, y = time)) + 
  geom_point() + 
  geom_smooth(method = "lm")
```



\
\
\
\

::: {.callout-note title = "CIs, test statistics, and p-values"}

We've observed 3 equivalent approaches to determining whether or not to reject the null hypothesis that $\beta$ equals some null value.

Assuming a 95% confidence level:

- If the **95%** CI for $\beta$ does not include the null value, reject $H_0$!
- If the p-value < **0.05**, reject $H_0$!
- If the |test statistic| > **2**, reject $H_0$! 


:::

\
\
\
\


### Example 6: tests for the intercept

$H_0: \beta_0 = 0$    
$H_a: \beta_0 \ne 0$    

```{r}
coef(summary(hike_model))
```

a. The p-value for this test is below 0.05 (0.036). What do you conclude?    

    - We have statistically significant evidence of a relationship between hiking time and elevation.
    - We have statistically significant evidence that there's no relationship between hiking time and elevation.
    - We have statistically significant evidence that the average hiking time of a sea level (0 elevation) hike is non-0.
    - We have statistically significant evidence that the average hiking time of a sea level hike is non-0.    

b. Is this test useful?







\
\
\
\






## Exercises {-}

### Exercise 1: Investigating flat caps (and intercept terms)

**Research Question:**
Can we predict whether or not a mushroom is poisonous based on the shape of its cap?

For this exercise, we will look at data from various species of *gilled* mushrooms in the Agaricus and Lepiota Families. We have information on whether a mushroom is `poisonous` (TRUE if it is, FALSE if it's edible) and its `cap_shape` (a categorical variable with 6 categories):

```{r}
# Load the data & packages
library(tidyverse)
mushrooms <- read_csv("https://Mac-STAT.github.io/data/mushrooms.csv") %>% 
  mutate(cap_shape = relevel(as.factor(cap_shape), ref = "flat")) %>%
  select(poisonous, cap_shape)
```

Note that in the code above, we have forced the reference category for the cap_shape predictor to be flat, though its not first alphabetically:

```{r}
head(mushrooms)
mushrooms %>% 
  count(cap_shape)
```

#### Part a

One of the most poisonous species of mushrooms is the *Amanita phalloides* or ["Death Cap" mushroom](http://www.bccdc.ca/health-info/prevention-public-health/death-cap-mushrooms), which typically has a flat cap shape when mature. Based on this anecdote, we hypothesize that species of mushrooms with flat caps in general may be *more* likely to be poisonous than edible.

First, let's translate this question to an appropriate null and alternative hypothesis that we can compare with a formal hypothesis test. Remember that `poisonous` is a binary outcome, so we need to frame our hypotheses in terms of odds (i.e., Odds(poisonous | flat cap) = P(poisonous|flat cap)/P(edible | flat cap)).


#### Part b

Fit a logistic regression model to investigate if a mushroom being `poisonous` is associated with its `cap_shape`.

THINK: What's the appropriate model -- normal or logistic regression?
NOTE: Remember that `flat` is the reference category in our table!

```{r}
mushroom_mod1 <- ___(  )

coef(mushroom_mod1)
```



#### Part c

Provide an appropriate interpretation of the estimated intercept coefficient **on the odds scale**. Based on this interpretation, do you believe mushrooms with flat caps are more likely to be poisonous, or more likely to be edible?



#### Part d

Let's look at the full model summary:

```{r}
summary(mushroom_mod1)
```

Report and interpret the test statistic for the intercept term (our coefficient of interest):


#### Part e

- Report and interpret the p-value for the intercept term.
- Based on this p-value and a significance level of 0.05, do we have evidence that mushrooms with flat caps are more likely to be poisonous than edible?







### Exercise 2: Investigtating different cap types

Let's continue to explore different elements of our mushroom model:

```{r}
coef(summary(mushroom_mod1))
```

#### Part a

Now suppose we are interested in whether the odds of being poisonous are different for mushrooms with other cap shapes. 

By hand, using the above coefficient table, calculate the odds of being poisonous for mushrooms with flat, knobbed, and conical caps.
Remember: the non-exponentiated coefficients represent a difference in *log-odds* compared to the reference category):

> odds(poisonous | flat cap) =

> odds(poisonous | knobbed cap) =

> odds(poisonous | conical cap) =

#### Part b

Relatedly, interpret the *exponentiated* estimates of the `cap_shapeconical` and `cap_shapeknobbed` coefficients.

Remember: these are *odds ratios*, comparing to `flat` cap mushrooms.

#### Part c

Based on these odds ratios, which of the 3 mushroom cap shapes would you be the *most* worried about eating: `flat`, `conical`, or `knobbed`?
The *least* worried about eating?


#### Part d

Let's get the full model summary again:

```{r}
summary(mushroom_mod1)
```

Now report and interpret the p-values for the coefficients corresponding to `cap_shapeknobbed` and `cap_shapeconical`.


#### Part e

Putting this all together, if you were given a plate of mushrooms with different cap shapes and had to pick one to eat, which one would you choose? Which cap shape would you absolutely avoid at all costs? Are your decisions guided by the coefficient estimates, the p-values, or both?


#### Part f

Let's look at the data a slightly different way, using a 6x2 table of counts:

```{r}
mushrooms %>% 
  filter(cap_shape %in% c("flat", "knobbed", "conical")) %>% 
  count(cap_shape, poisonous) %>% 
  pivot_wider(names_from = poisonous, values_from = n, names_prefix = "Poisonous = ")

```

Now, if you were given a plate of mushrooms with different cap shapes and had to pick one shape to eat and one to absolutely avoid, would you choose the same shapes? Why or why not?




### Exercise 3

For this exercise, let's return to the fish dataset from a previous activity.

```{r}
fish <- read_csv("https://Mac-STAT.github.io/data/Mercury.csv")

head(fish)
```

**Research question:** We believe the length of a fish (measured in centimeters) is causally associated with its mercury concentration (measured in parts per million [ppm]). We suspect that the river a fish is sampled from may be a confounder, since differences in the river environment may causally influence both the average length of fish (e.g. due to differences in water temperature or food availability) as well as mercury concentration (e.g. due to differences between the two rivers in mercury pollution levels). 

#### Part a

Fit a linear regression model that can be used to answer our research question.

```{r}
mod_fish1 <- ___
summary(mod_fish1)
```

#### Part b

Interpret the coefficient estimate, test statistic, and p-value for the `RiverWacamaw` coefficient. Assume we have specified a significance level of 0.05. 

> **Response**

#### Part c

Suppose we now want to determine if the causal effect of fish length on mercury concentration *differs* according to the river from which a fish was sampled. 

First, modify the code chunk below to visualize the 3-way relationship between the `Concen`, `Length`, and `River` variables.

```{r}
fish %>% 
  ggplot(aes(x = ___, y = ___, colour = ___)) + 
  # [ADDITIONAL GGPLOT LAYER(S)]
```

Next, fit an appropriate linear regression model with an interaction term to investigate this question.

```{r}
mod_fish2 <- ___
summary(mod_fish2)
```

#### Part d

Interpret the coefficient estimate, test statistic, and p-value for the `RiverWacamaw:Length` interaction term in this revised model (`mod_fish2`). Assume we've set a significance level of 0.05. 


#### Part e

Interpret the coefficient estimate, test statistic, and p-value for the `RiverWacamaw` coefficient in this revised model (`mod_fish2`). (again, you can assume we've set a significance level of 0.05).

#### Part f (CHALLENGE)

Suppose another researcher runs the same model we fit in part c above (`mod_fish2`), but they claim that a more appropriate alternative hypothesis should be $\beta_1 < 0$, (and not $\beta_1 \ne 0$, as is assumed by default when running a regression model). Because of this, they reported a smaller p-value for the coefficient, and claim that the Wacamaw River has a lower baseline mercury concentration (i.e., when `Length = 0cm`).

- What is the p-value they would have reported for the `RiverWacamaw` coefficient in `mod_fish2`?

- What is a potential ethical problem with the other researcher's claim that the alternative hypothesis should be $\beta_1 < 0$? 

#### Part g (CHALLENGE)

You point out to the other researcher that the intercept and `RiverWacamaw` coefficients are both negative, so whatever difference in mercury concentration between the two rivers your model predicts "at baseline" is not useful or meaningful--you cannot have a fish that is 0cm long, nor a mercury concentration < 0ppm.

You propose that a more appropriate model should transform the `Length` variable in some way to make the intercept more interpretable. Create a new variable named `Length_adj` with this transformation and use it to re-fit the model:

```{r}
# Create a new variable

# Then refit the model
mod_fish3 <- lm(Concen ~ Length_adj*River, data=fish)
summary(mod_fish3)
```

Compare the output of this model to that of `mod_fish2`. What happened to the estimate, test statistic, and p-value for the `RiverWacamaw` coefficient? How does this affect your conclusion? How about the other researcher's conclusion? 


## Done!

- Finalize your notes: (1) Render your notes to an HTML file; (2) Inspect this HTML in your Viewer -- check that your work translated correctly; and (3) Outside RStudio, navigate to your `inclass_activities` subfolder within your `stat155` folder and locate the HTML file -- you can open it again in your browser.
- Clean up your RStudio session: End the rendering process by clicking the 'Stop' button in the 'Background Jobs' pane.
- Check the solutions in the course website, at the bottom of the corresponding chapter.
- Work on homework and/or any extra practice exercises!


