Topic 3 Effective Visualizations

Learning Goals

  • Understand and apply the guiding principles of effective visualizations

You can download a template .Rmd of this activity here. Put the file in a Other_Activities folder within your COMP_STAT_112 folder.

Effective Visualizations

Benefits of Visualizations

Visualizations help us understand what we’re working with:

  • What are the scales of our variables?
  • Are there any outliers, i.e. unusual cases?
  • What are the patterns among our variables?

This understanding will inform our next steps:

  • What method of analysis / model is appropriate?

Once our analysis is complete, visualizations are a powerful way to communicate our findings and tell a story.

Analysis of Graphics

There is not one right way to visualize a data set.

However, there are guiding principles that distinguish between “good” and “bad” graphics.

One of the best ways to learn is by reading graphics and determining which ways of arranging thing are better or worse. So before jumping directly into theoretical principles, let’s try some critical analysis on specific examples.

Example 3.1 For your assigned graphics or sets of graphics, identify the following:

  1. the story the graphic is aiming to communicate to the audience
  2. effective features of the graphic
  3. areas for improvement
Source: http://viz.wtf/

Figure 3.1: Source: http://viz.wtf/




Source: N. Yau, *Visualize This*, 2011, p. 223-225.

Figure 3.2: Source: N. Yau, Visualize This, 2011, p. 223-225.




Source: N. Yau, *Visualize This*, 2011, p. 242.

Figure 3.3: Source: N. Yau, Visualize This, 2011, p. 242.




Gun deaths.

Figure 3.4: Gun deaths.




Source: N. Yau, *Visualize This*, 2011, p. 150.

Figure 3.5: Source: N. Yau, Visualize This, 2011, p. 150.




Source: C. N. Knaflic, *Storytelling with Data*, 2015, p. 142.

Figure 3.6: Source: C. N. Knaflic, Storytelling with Data, 2015, p. 142.




Source: S. Few, *Now You See It*, 2009, p. 45.

Figure 3.7: Source: S. Few, Now You See It, 2009, p. 45.




Climate change.

Figure 3.8: Climate change.




Source: C. N. Knaflic, *Storytelling with Data*, 2015, p. 48.

Figure 3.9: Source: C. N. Knaflic, Storytelling with Data, 2015, p. 48.




Diamond data visualizations from [*R for Data Science*](http://r4ds.had.co.nz/data-visualisation.html#position-adjustments), 2017Diamond data visualizations from [*R for Data Science*](http://r4ds.had.co.nz/data-visualisation.html#position-adjustments), 2017Diamond data visualizations from [*R for Data Science*](http://r4ds.had.co.nz/data-visualisation.html#position-adjustments), 2017Diamond data visualizations from [*R for Data Science*](http://r4ds.had.co.nz/data-visualisation.html#position-adjustments), 2017Diamond data visualizations from [*R for Data Science*](http://r4ds.had.co.nz/data-visualisation.html#position-adjustments), 2017

Figure 3.10: Diamond data visualizations from R for Data Science, 2017




Source: S. Few, *Now You See It*, 2009, p. 37.

Figure 3.11: Source: S. Few, Now You See It, 2009, p. 37.




Source: N. Yau, *Visualize This*, 2011, p. 249.

Figure 3.12: Source: N. Yau, Visualize This, 2011, p. 249.




Source: S. Few, *Now You See It*, 2009, p. 61.

Figure 3.13: Source: S. Few, Now You See It, 2009, p. 61.




Source: C. N. Knaflic, *Storytelling with Data*, 2015, p. 68.

Figure 3.14: Source: C. N. Knaflic, Storytelling with Data, 2015, p. 68.




Source: C. N. Knaflic, *Storytelling with Data*, 2015, p. 81.

Figure 3.15: Source: C. N. Knaflic, Storytelling with Data, 2015, p. 81.




Source: http://viz.wtf/

Figure 3.16: Source: http://viz.wtf/




Source: A. Cairo, *The Functional Art*, 2013, p. 340.

Figure 3.17: Source: A. Cairo, The Functional Art, 2013, p. 340.




Source: N. Yau, *Visualize This*, 2011, p. 220.

Figure 3.18: Source: N. Yau, Visualize This, 2011, p. 220.




More examples:

Properties of Effective Visualizations

Storytelling / Context

Remember …

Graphics are designed by the human expert (you!) in order to reveal information that’s in the data.

Your choices depend on what information you want to reveal and convey. So before you complete a graphic, you should clearly identify what story you want the graphic to tell to the audience, and double check that this story is being told.1

Here is a nice example from FiveThirtyEight where each chart tells a story in answer to a particular question about the [then] upcoming German election.

Here is an interactive visualization that tells a story about gun violence.

Another important contextual question to ask is whether the graphic is for an explanatory (explain why) or exploratory (discovering something new) analysis.

Accessibility

In addition to considering the story you are telling, you need to consider what audiences can access your story.

Alternative (Alt) Text: In order for data visualizations to be accessible to people who are blind and use screen readers, we can provide alt text. Alt text should concisely articulate (1) what your visualization is (e.g. a bar chart showing which the harvest rate of cucumbers), (2) a one sentence description of the what you think is the most important takeaway your visualization is showing, and (3) a link to your data source if it’s not already in the caption (check out this great resource on writing alt text for data visualizations).

To add the alt text to your the HTML created from knitting the Rmd, you can include it as an option at the top of your r chunk. For example: {r, fig.alt = “Bar chart showing the daily harvest of cucumbers. The peak cucumber collection day is August 18th”}. To see the alt text in the knitted html file, you can hover over the image.

Color-blind friendly color palettes: In order for data visualizations to be accessible to people with color blindness, we need to be thoughtful about the colors we use in our data visualizations. The most common variety of color-blindness makes it hard for individuals to detect differences between red and green. Some types make it hard detect differences between blue and yellow. Other types make it hard to see different shades of a color.

This Chromatic Vision Simulator can give you a sense of how this could impact your perception of colors (see image below). You could also upload a visualization to this simulator to see how well your chosen color palette works.

Four images of the painting of Wheatfield with a Reaper by Vincent van Gogh under different types of color blindness. Source: https://asada.website/webCVS/

Try to use a color-blind friendly / safe palette whenever possible. One easy way to do this is to include + scale_fill_viridis_d() or + scale_color_viridis_d() when you are filling or coloring by a discrete or categorical variable and + scale_fill_viridis_c() or + scale_color_viridis_c() when you are filling or coloring by a continuous or quantitative variable. There are many other color-blind friendly palettes in R; you can check out other resources here.

Ethics

Michael Correll of Tableau Research writes “Data visualizations have a potentially enormous influence on how data are used to make decisions across all areas of human endeavor.” in his article from 2018.

Visualization operates at the intersection of science, communication, and data science & statistics. There are professional standards of ethics in these fields of the power they hold over other people as it relates to making data-driven decisions.

Correll describes three ethical challenges of visualization work:

  1. Visibility Make the invisible visible
  • Visualize hidden labor
  • Visualize hidden uncertainty
  • Visualize hidden impacts

Visualizations can be complex and one must consider the accessibility of the visualization to the audience. Managing complexity is, therefore, a virtue in design that can be in direct opposition with the desire to visualize the invisible.

  1. Privacy Collect data with empathy
  • Encourage Small Data
  • Anthropomorphize data
  • Obfuscate data to protect privacy

Restricting the type and amount of data that is collected has a direct impact on the quality and scope of the analyses hence obligation to provide context, and analytical power can, therefore, stand in direct opposition to the empathic collection of data.

  1. Power Challenge structures of power
  • Support data due process.
  • Act as data advocates.
  • Pressure unethical analytical behavior.

The goal of promoting truth and suppressing falsehood may require amplifying existing structures of expertise and power, and suppressing conflicts for the sake of rhetorical impact.

At a minimum, you should always

  1. Present data in a way that avoids misleading the audience.

  2. Always include your data source. Doing so attributes credit for labor, provides credibility to your work, and provides context for your graphic.

Design

A basic principle is that a graphic is about comparison. Good graphics make it easy for people to perceive things that are similar and things that are different. Good graphics put the things to be compared “side-by-side,” that is, in perceptual proximity to one another. The following aesthetics are listed in roughly descending order of human ability to perceive and compare nearby objects:2

  1. Position
  2. Length
  3. Angle
  4. Direction
  5. Shape (but only a very few different shapes)
  6. Area
  7. Volume
  8. Shade
  9. Color

Color is the most difficult, because it is a 3-dimensional quantity. We are pretty good at color gradients, but discrete colors must be selected carefully. We need to be particularly aware of red/green color blindness issues.

Visual perception and effective visualizations

Here are some facts to keep in mind about visual perception from Now You See It:

  1. Visual perception is selective, and our attention is often drawn to constrasts from the norm.
Black patterns on white background demonstrating visual perpection is drawn toward contrast. Originally from C. Ware, *Information Visualization: Perception for Design*, 2004? Source: S. Few, *Now You See It*, 2009, p. 33.

Figure 3.19: Our attention is drawn to contrasts to the norm. What stands out in this example image?, which is originally from C. Ware, Information Visualization: Perception for Design, 2004? Source: S. Few, Now You See It, 2009, p. 33.

Implication: We should design visualizations so that the features we want to highlight stand out in contrast from those that are not worth the audience’s attention.

  1. Our eyes are drawn to familiar patterns. We see what we know and expect.
Rose with an embedded shadow of a dophin demonstraing that visual perpection focuses on familiar patterns. From coolbubble.com. Source: S. Few, *Now You See It*, 2009, p. 34.

Figure 3.20: Do you see anything embedded in this rose image from coolbubble.com? Source: S. Few, Now You See It, 2009, p. 34.

Implication: Visualizations work best when they display information as patterns that familiar and easy to spot.

  1. Memory plays an important role in human cognition, but working memory is extremely limited.

Implication: Visualizations must serve as external aids to augment working memory. If a visualization is unfamiliar, then it won’t be as effective.

Gestalt principles

The Gestalt principles (more info here or here) were developed by psychologists including Max Wertheimer in the early 1900s to explain how humans perceive organized patterns and objects.

In a design setting, they help us understand how to incorporate preattentive features into visualizations. The figure below shows some preattentive features, all of which are processed prior to conscious attention (“at a glance”) and can help the reader focus on relevant information in a visualization.

Visual examples of preattentive features based on the Gestalt principles. Source: I. Meirelles, *Design for Information*, 2013, p. 23.

Figure 3.21: Preattentive features based on the Gestalt principles. Source: I. Meirelles, Design for Information, 2013, p. 23.

Other design tips from Visualize This and Storytelling with Data:

  • Put yourself in a reader’s shoes when you design data graphics. What parts of the data need explanation? We can minimize ambiguity by providing guides, label axes, etc.
  • Data graphics are meant to shine a light on your data. Try to remove any elements that don’t help you do that. That is, eliminate “chart junk” (distracting and unnecessary adornments).
  • Vary color and stroke styles to emphasize the parts in your graphic that are most important to the story you’re telling
  • It is easier to judge length than it is to judge area or angles
  • Be thoughtful about how your categories (levels) are ordered for categorical data. There may be a natural ordering
  • Pie charts, donut charts, and 3D are evil

Basic Rules for Constructing Graphics

Instead of memorizing which plot is appropriate for which situation, it’s best to simply start to recognize patterns in constructing graphics:

  • Each quantitative variable requires a new axis.
  • Each categorical variable requires a new way to “group” the graphic (eg: using colors, shapes, separate facets, etc to capture the grouping).
  • For visualizations in which overlap in glyphs or plots obscures the patterns, try faceting or transparency.

Still to Come

While we will not cover all of visualization theory – you can take a whole course on that at Macalester and it is a proper field in its own right – we will touch on the following types of visualizations in the coming weeks:

  • Bivariate visualizations
  • Visualizations of higher dimensional data
  • Temporal structures: timelines and flows
  • Hierarchical structures: trees
  • Relational structures: networks
  • Spatial structures: maps
  • Spatio-temporal structures
  • Textual structures
  • Interactive graphics (e.g., gganimate, shiny)

More Practice

Example 3.2 Consider one of the more complicated data graphics listed at (http://mdsr-book.github.io/exercises.html#exercise_25):

  1. What story does the data graphic tell? What is the main message that you take away from it?
  2. Can the data graphic be described in terms of the Grammar of Graphics (frame, glyphs, aesthetics, facet, scale, guide)? If so, please describe.
  3. Critique and/or praise the visualization choices made by the designer. Do they work? Are they misleading? Thought-provoking? Brilliant? Are there things that you would have done differently? Justify your response.
Solution

Example: http://hint.fm/wind/

  1. The dynamic visual shows the chaos and beauty of the wind. Since the graphic is updated regularly, the exact wind patterns will be different so the overall message is that there is an interesting relationship between topography and wind with a large amount of uncertainty and chaos.

  2. frame: longitude (x) and latitude (y); glyph: paths/lines, geography boundaries (lower 48 states), circles for cities, text for city names; aesthetics: white color of paths/lines, speed of animation path/line, size of white city circles corresponds to population, dark grey color for geographic polygon boundary, white outlined black text of city names; no facets; guide for speed of animation of paths, no guide for city circle size.

  3. Since the graphic is only for one time and date, you can’t compare how the patterns change over time. It would be interesting to include elevation in the background map to see how the wind patterns are impacted by the topography.


  1. A “negative” result (e.g., there is no correlation between two variables) is a perfectly fine story to tell.↩︎

  2. This list is from B. S. Baumer, D. T. Kaplan, and N. J. Horton, Modern Data Science with R, 2017, p. 15. For more of the theory of perception, see also W.S. Cleveland and R. McGill, “Graphical perception: Theory, experimentation, and application to the development of graphical methods,” Journal of the American Statistical Association, 1984.↩︎