R Resources
Tidyverse resources
- Brianna Heggeseth’s COMP/STAT 112 website (with code examples and videos)
- R for Data Science
- Exploratory Data Analysis with R
- John’s Hopkins Tidyverse course text
Some example code
Creating new variables
case_when()
from the dplyr
package is a very versatile function for creating new variables based on existing variables. This can be useful for creating categorical or quantitative variables and for creating indices from multiple variables.
# Turn quant_var into a Low/Med/High version
data <- data %>%
mutate(cat_var = case_when(
quant_var < 10 ~ "Low",
quant_var >= 10 & quant_var <= 20 ~ "Med",
quant_var > 20 ~ "High"
)
)
# Turn cat_var (A, B, C categories) into another categorical variable
# (collapse A and B into one category)
data <- data %>%
mutate(new_cat_var = case_when(
cat_var %in% c("A", "B") ~ "A or B"
cat_var=="C" ~ "C"
)
)
# Turn a categorical variable (x1) encoded as a numerical 0/1/2 variable into a different quantitative variable
# Doing this for multiple variables allows you to create an index
data <- data %>%
mutate(x1_score = case_when(
x1==0 ~ 10,
x1==1 ~ 20,
x1==2 ~ 50
)
)
# Add together multiple variables with mutate
data <- data %>%
mutate(index = x1_score + x2_score + x3_score)