Bivariate Visualizations

Brianna Heggeseth

Announcements

Feedback/Assessments

  • Assignment 1 feedback will be available in individual spreadsheets soon
  • Tidy Tuesday (TT1) feedback is in individual spreadsheets
  • Tidy Tuesday (TT2) is posted today (Moodle)
    • Data set: CRAN R Packages

Check it out and let me know if you encounter any issues!

Learning Goals

  • Identify appropriate types of bivariate visualizations, depending on the type of variables (categorical, quantitative)
  • Create basic bivariate visualizations based on real data

Alt Text for Visualizations

I want you to practice writing alt text for all of the visualizations you create.

You can add alt text to your document by adding fig.alt="alt text here" after the r in {r} at the top of an R chunk.

Bivariate Visualizations

In this activity, we will analyze data from the 2016 presidential election.

We’ll explore county-level election outcomes and demographics.

Template File

Go to the Course Website and today’s activity.

Download the template .Rmd of this activity. Put the file in the Assignment_03 folder within your COMP_STAT_112 folder.

  • Add your name and your tablemates names as collaborators.
  • This .Rmd only contains examples that we’ll work on in class and exercises you’ll finish for Assignment 3.

Getting to know the dataset

Loading in the Data

elect <- read_csv("https://bcheggeseth.github.io/112_fall_2023/data/electionDemographics16.csv")

Getting to know the dataset

Check out the first rows of elect. What are the units of observation?

head(elect)
# A tibble: 6 × 34
  county      total_2008 dem_2008 gop_2008 oth_2008 total_2012 dem_2012 gop_2012
  <chr>            <dbl>    <dbl>    <dbl>    <dbl>      <dbl>    <dbl>    <dbl>
1 Walker Cou…      28652     7420    20722      510      28497     6551    21633
2 Bullock Co…       5415     4011     1391       13       5318     4058     1250
3 Calhoun Co…      49242    16334    32348      560      46240    15500    30272
4 Barbour Co…      11630     5697     5866       67      11459     5873     5539
5 Fayette Co…       7957     1994     5883       80       7912     1803     6034
6 Baldwin Co…      81413    19386    61271      756      84988    18329    65772
# ℹ 26 more variables: oth_2012 <dbl>, total_2016 <dbl>, dem_2016 <dbl>,
#   gop_2016 <dbl>, oth_2016 <dbl>, perdem_2016 <dbl>, perrep_2016 <dbl>,
#   winrep_2016 <lgl>, perdem_2012 <dbl>, perrep_2012 <dbl>, winrep_2012 <lgl>,
#   perdem_2008 <dbl>, perrep_2008 <dbl>, winrep_2008 <lgl>, region <dbl>,
#   total_population <dbl>, percent_white <dbl>, percent_black <dbl>,
#   percent_asian <dbl>, percent_hispanic <dbl>, per_capita_income <dbl>,
#   median_rent <dbl>, median_age <dbl>, polyname <chr>, abb <chr>, …

Getting to know the dataset

How much data do we have?

dim(elect)
[1] 3112   34

Getting to know the dataset

What are the names of the variables?

names(elect)
 [1] "county"            "total_2008"        "dem_2008"         
 [4] "gop_2008"          "oth_2008"          "total_2012"       
 [7] "dem_2012"          "gop_2012"          "oth_2012"         
[10] "total_2016"        "dem_2016"          "gop_2016"         
[13] "oth_2016"          "perdem_2016"       "perrep_2016"      
[16] "winrep_2016"       "perdem_2012"       "perrep_2012"      
[19] "winrep_2012"       "perdem_2008"       "perrep_2008"      
[22] "winrep_2008"       "region"            "total_population" 
[25] "percent_white"     "percent_black"     "percent_asian"    
[28] "percent_hispanic"  "per_capita_income" "median_rent"      
[31] "median_age"        "polyname"          "abb"              
[34] "StateColor"       

Review: Univariate Viz

Categorical Variable: Counts/Frequencies & Bar Plot

table(elect$winrep_2016)

FALSE  TRUE 
  487  2625 
table(elect$winrep_2016) / 3112 

   FALSE     TRUE 
0.156491 0.843509 
library(ggplot2)
# Construct a bar chart (a visual summary) of this variable.
ggplot(elect, aes(x = winrep_2016)) +
  geom_bar()

Barplot of the count of U.S. counties that Trump won (represented by TRUE) or lost (represented by FALSE) in 2016. Trump won the vast majority of U.S. counties in 2016. Return data from https://github.com/tonmcg/County_Level_Election_Results_12-16.

Trumps county-level wins and losses in 2016.

Try writing some alt text!

  • Let’s use a screen reader to see my alt text in action!
  • We can also right-click and press Inspect (on Chrome).

Review: Univariate Viz

Quantitative Variable: Histogram or Density plot

  • Summary of typical value, variation, and unusual features

Histogram of percentage of votes that were Republican within a U.S. county in 2016 presidential election. Most counties had between 50 and 75% of the vote go Republican.

U.S. county-level presidential vote percentage that went Republican in 2016.

Density plot of percentage of votes that were Republican within a U.S. county in 2016 presidential election. Most counties had between 50 and 75% of the vote go Republican.

U.S. county-level presidential vote percentage that went Republican in 2016.

Preview: Bivariate Viz

Quantitative + Quantitative Variable: Scatterplot

Scatter plot of Republician vote percent in U.S. counties in 2012 and 2016 labeled according to state. There is a strong positive relationship and Utah counties tended to have a lower Republican vote percentage in 2016 than what you'd expect given 2012.

U.S. county-level presidential vote percentage that went Republican in 2012 and 2016

Preview: Bivariate Viz

Quantitative + Categorical Variable: Density Plots, Boxplots, etc.

Density plots of Republician vote percent in U.S. counties in 2016 separated by state voting history categorized as blue, purple, or red. Historically red states tend to have a higher Republican vote percentage in 2016 than purple swing states or blue Democratic states.

Republician vote percent in U.S. counties in 2016 separated by state voting history.

Preview: Bivariate Viz

Categorical + Categorical Variable: side-by-side, proportion bar plots, etc.

Proportional bar plots of percentage of U.S. counties that went for Trump  in 2016 separated by state voting history. Historically red states tend to have a higher proportion of counties that went for Trump in 2016 than purple swing states or blue Democratic states.

Percentage of U.S. counties that went for Trump in 2016 separated by state voting history

In Class

Work on the activity, checking in with your mates at your table.

Notice patterns in the code! Make sure you understand what each line of code is doing.

Feel free to make visualizations more effective as you go along.

After Class

You’ll make sure to complete Exercise 1-8 for the Assignment 3 (due next Wednesday).

For Thursday’s class, meet in the Idea Lab in the Library!