R Resources

Tidyverse resources

Visualization resources

Example code

Creating new variables

case_when() from the dplyr package is a very versatile function for creating new variables based on existing variables. This can be useful for creating categorical or quantitative variables and for creating indices from multiple variables.

# Turn quant_var into a Low/Med/High version
data <- data %>%
    mutate(cat_var = case_when(
            quant_var < 10 ~ "Low",
            quant_var >= 10 & quant_var <= 20 ~ "Med",
            quant_var > 20 ~ "High"
        )
    )

# Turn cat_var (A, B, C categories) into another categorical variable
# (collapse A and B into one category)
data <- data %>%
    mutate(new_cat_var = case_when(
            cat_var %in% c("A", "B") ~ "A or B"
            cat_var=="C" ~ "C"
        )
    )

# Turn a categorical variable (x1) encoded as a numerical 0/1/2 variable into a different quantitative variable
# Doing this for multiple variables allows you to create an index
data <- data %>%
    mutate(x1_score = case_when(
            x1==0 ~ 10,
            x1==1 ~ 20,
            x1==2 ~ 50
        )
    )

# Add together multiple variables with mutate
data <- data %>%
    mutate(index = x1_score + x2_score + x3_score)