Skip to article frontmatterSkip to article content

What Makes for Good Data Visualization?

Overview

What Makes a Good Visualization? We want graphics to be eye catching and informative. In this chapter we’ll discuss different aspects that can affect the quality of your figures and specific considerations relevant to the geosciences.

  1. The Importance of Data Visualization
  2. Publication Ready Figures
  3. The Problem with Rainbow Colormaps
  4. Misleading Visualizations

Prerequisites

ConceptsImportanceNotes
MatplotlibNecessary
CartopyNecessary
  • Time to learn: 3 hours

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import geocat.viz as gv

The Importance of Data Visualization

It is important to use pictures to show data because we can visually detect patterns that could be lost in statistical analysis. All scientific disciplines use data visualizations to communicate concepts.

Here we have a figure from Autodesk that shows a “Datasaurus” and 12 other datasets that share the same statistical information (mean, standard deviation, etc). We can see immediately that visually are telling very different stories: be it a dinosaur, a star, an oval, concentric ovals, or a series of lines (perhaps weather fronts).

Same Stats

Publication Ready Figures

For your figure to be publication rady, you probably want to change some of Matplotlib’s default plotting settings: selecting fontsizes for your titles and labels, changing figure sizes, or subplot/colormap layout.

To demonstrate this, let’s look at an example:

# fake data
x = [0, 1, 2, 3, 4, 5]
y = [0, 3, 6, 9, 12, 15]
 
# plot
plt.plot(x, y)

# annotate
plt.title('Title')
plt.xlabel('X Label')
plt.ylabel('Y Label')

plt.show();

Now let’s show some customization options:

# fake data
x = [0, 1, 2, 3, 4, 5]
y = [0, 3, 6, 9, 12, 15]
 
# plot
plt.plot(x, y, '--', color='red')

# annotate
plt.title('Title', fontsize=20)
plt.xlabel('X Label', fontsize=16)
plt.ylabel('Y Label', fontsize=16)

plt.show();

Matplotlib Global Parameters

Matplotlib has defaults for fontsizes and all sorts of attributes of a plot. Instead of setting your fontsize in every script, it is possible to set global parameters to change the default values of these attributes.

You can veiw the globoal parameters options and their current settings with:

mpl.rcParams.keys

To change any given parameter you would use the following command (replacing your parameter and value, of course):

mpl.rcParams['font.family'] = 'Arial'

Using GeoCAT-Viz

The GeoCAT-Viz package also has many utility functions for making your plots looks publication ready in fewer lines of code. The defaults of GeoCAT-viz keword-arguments are set to resemble the style of NCL.

# fake data
x = [0, 1, 2, 3, 4, 5]
y = [0, 3, 6, 9, 12, 15]
 
# plot
plt.plot(x, y)

# annotate
plt.title('Title')
plt.xlabel('X Label')
plt.ylabel('Y Label')

gv.set_titles_and_labels(plt.gca())

plt.show();

The Problem with Rainbow Colormaps

Rainbow colormaps are visually beautiful, but are falling out of favor because

  1. They are not colorblind friendly and
  2. They do not print out in grayscale in a meaningful way.

Both of these issues can be addressed by bing careful about you colormaps lightness-values.

Some colormaps options are perceptually uniform (the same lightness value), sequentially ordered (goes from lighter to darker), or diverging (lightest or darkest at a set point and uniformly changes lightness going out). A rainbow colormap however is lighter or darker in arbitrary places and it affects how we interpret data (especially if it was printed out in grayscale).

For example, from Matplotlib’s Choosing a Colormap documentation here are some “good” colormaps:

Matplotlib Logo

And here are miscellaneous colormaps:

Matplotlib Logo

Looking at the colors in grayscale helps to understand why we might prefer a sequentially ordered colormap. Some grayscale values are duplicated and the reader will not know if it is a high or low value.

Another consideration that can help those who are visually impaired is to make sure your figure comments are substantial. Use words to paint the picture of what is displayed, not just the conclusions you want the reader to get.

Misleading Visualizations

The scales or colors we choose to use for data visualization affect how people interpret figures. You should strive to make your visualizations as accurate and as informative as possible. Here are some examples that demonstrate just how different a figure can look based on these choices you make. Do not intentionally mislead your audience!

Perhaps the most common data visualization manipulation is to change the Y-scale. If a plot does not begin at 0, small changes in magnitude can be exhaggerated. Similarly a logarithmic scale will amplify changes. This is not always disingenuous, sometimes these changes are what you want to highlight, the pattern you want to draw attention to. Just make sure it is appropriate for your use case and documented. Alternatively, extending the Y-axis too large has the opposite affect and smooths out the differences in data.

x = [1, 2, 3, 4, 5]
y = [1101, 1011, 1111, 1070, 1050]


fig, (ax1, ax2, ax3, ax4) = plt.subplots(4)
fig.tight_layout()

ax1.bar(x,y)
ax1.set_title("Default Y-Scale Starts at 0")

ax2.bar(x,y)
ax2.set_ylim(1000)
ax2.set_title("Y-Scale Starts at 1000")

ax3.bar(x,y)
ax3.set_yscale("log")
ax3.set_title("Y-Scale is Logarithmic");

ax4.bar(x,y)
ax4.set_ylim(0, 2000)
ax4.set_title("Y-Scale is Extended");

Other examples of data visualization manipulation include improper scaling, cherry picking a small non-representative subset of the data to display, displaying pie charts at a slant (pie charts are hard to interpet accurately as is), and unusing unexpected colormaps.


Summary

It is important to have accurate, engaging, and representative data visualization to accumpany your research, both for data exploration as part of the scientific process, for communication of results, and education/outreach efforts. Visually we pick up on patterns that statistics alone may not convey. However, an over reliance on data visualization can make science less accessible to those with vision disabilities. It is important to be cognicent of the patterns our minds pick up, be it based on color or y-axis scaling, so that we can avoid misleading our audience and more accurately convey the narrative inherent to the data.

What’s next?

Let’s break down the different components of data visualization in Plot Elements.

Resources and references