Engaging and effective data visualisations

WBS Behavioural Science Summer School
4-5 June 2024

Dr Nicola Rennie

About me

Lecturer in Health Data Science within the Centre for Health Informatics, Computing, and Statistics.


Background in statistics, operational research (transport), and data science consultancy.


Collaborate with local NHS trusts on data science projects.


Co-author of Royal Statistical Society’s Best Practices for Data Visualisation guidance.

CHICAS logo

But first, some background!

How did the Royal Statistical Society’s Best Practices for Data Visualisation guide start?

A survey in 2021 asked Royal Statistical Society (RSS) members their views on Significance magazine.

Respondents were asked, “What aspects of content could be improved?”

  • “Better, more consistent charts… I’d like to see a house style like The Economist
  • “The figures are often difficult to read…”
  • “The plots sometimes look amateurish…”

What’s the aim of the guide?

The guide would:

  • Help contributors develop data visualisations that are high quality, readable, effective at conveying information, and fulfill their intended purpose.
  • Summarise and link to authoritative advice on chart styles and formats for different types of data.
  • Show how to override software defaults in common data visualisation software and packages.

Skip forward quite a few months…

Screenshot of data vis guide homepage

The rest of this talk…

In this session we will cover…

  • why you should visualise data;

  • some guidelines for making better charts;

  • examples of good and bad charts!

The role of visualisation

Why visualise data?

Data visualisation has two main purposes:

  • Exploratory data analysis and identifying data issues
  • Communicating insights and results

book shelf cartoon

Exploratory data visualisation

Because summary statistics aren’t enough…

Dataset A B
mean_x 54.2632732 54.2658818
mean_y 47.8322528 47.8314957
sd_x 16.7651420 16.7688527
sd_y 26.9354035 26.9386081
cor_xy -0.0644719 -0.0686092


See also “A hypothesis is a liability”: genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02133-w

Communicating insights with data visualisation

Grab attention

Visualisations stand out. If a reader is short on time or uncertain about whether a document is of interest, an attention-grabbing visualisation may entice them to start reading.

Improve access to information

Textual descriptions can be lengthy and hard to read, and are frequently less precise than a visual depiction showing data points and axes.

Summarise content

Visual displays allow for summarising complex textual content, aiding the reader in memorising key points.

Communicating insights with data visualisation

John Snow collected data on cholera deaths and created a visualisation where the number of deaths was represented by the height of a bar at the corresponding address in London.

This visualisation showed that the deaths clustered around Broad Street, which helped identify the cause of the cholera transmission, the Broad Street water pump.

Snow. 1854.

John Snow cholera map

Why do pie charts have a bad reputation?

“All variations [of pie charts] lead to overestimation of small values and underestimation of large ones.” Kesara and Skau (2016)

Why do 3D charts have a bad reputation?

On the plot on the left, how tall is the bar?

Two 3D bar charts

What are you trying to communicate?

Data visualisations must serve a purpose.

Ask yourself:

  • What is the purpose?
  • Does the visualisation support the purpose?
  • Is it quick, accurate, and intuitive?

Elements of charts

Elements of charts

  • Layout
  • Aspect ratio
  • Lines
  • Points
  • Colours
  • Axes
  • Symbols
  • Legends
  • Orientation
  • Auxiliary elements
  • Dimensionality

Layouts, aspect ratios, and axes

Layouts, aspect ratios, and axes

Layouts, aspect ratios, and axes

Layouts, aspect ratios, and axes

Longer labels are best on the y-axis, horizontally.

Layouts, aspect ratios, and axes

Should the axes start at 0?

Layouts, aspect ratios, and axes

They don’t always have to start at zero…

Layouts, aspect ratios, and axes

Order categories appropriately…

Layouts, aspect ratios, and axes

Badly ordered chart of covid cases

Order based on magnitude unless the category order has meaning…

Source: Georgia Department of Public Health

Lines

  • Suggest an order
  • Suggest continuity

Legends

  • Should not use up valuable space for data
  • May be integrated into the figure

Legends

or use coloured fonts in the subtitle…

Line chart of stick performance

Styling charts

Colours

Why use colours in data visualisation?

  • Colours should serve a purpose, e.g. discerning groups of data

  • Colours can highlight or emphasise parts of your data.

  • Not always the most effective for, e.g. communicating differences between variables.

Colours

Different types of colour palettes…


… for different types of data.

Examples of sequential, diverging, and qualitative palettes

Colours

Is this a good choice of colour?

Colours

Check for colourblind friendly plots with colorblindr::cvd_grid(g).

Fonts

  • Font size: larger fonts are (usually) better

  • Font colour: ensure sufficient contrast

  • Font face: highlight text using bold font, avoid italics

  • Font family: choose a clear font with distinguishable features (pick something familiar)

There is no perfect font.

Key points

  • Charts should have a purpose

  • Actively design visualisations

  • Default settings aren’t always the best choices

Good charts don’t have to be boring!

Cara Thompson (cararthompson.com)

Stacked diverging bar chart of lego colours

Cedric Scherer (cedricscherer.com)

small multiples are charts of college basketball

Good charts don’t have to be boring!

Tanya Shapiro (tanyaviz.com)

Supreme court justice chart

Dan Oehm (gradientdescending.com)

Sloped area chart

Discussion

In groups, discuss how you might visualise the following data?

study_id treatment dosing_regimen_for_scurvy gum_rot_d6 skin_sores_d6 weakness_of_the_knees_d6 lassitude_d6 fit_for_duty_d6
001 cider 1 quart per day 2_moderate 2_moderate 2_moderate 2_moderate 0_no
002 cider 1 quart per day 2_moderate 1_mild 2_moderate 3_severe 0_no
003 dilute_sulfuric_acid 25 drops of elixir of vitriol, three times a day 1_mild 3_severe 3_severe 3_severe 0_no
004 dilute_sulfuric_acid 25 drops of elixir of vitriol, three times a day 2_moderate 3_severe 3_severe 3_severe 0_no
005 vinegar two spoonfuls, three times daily 3_severe 3_severe 3_severe 3_severe 0_no
006 vinegar two spoonfuls, three times daily 3_severe 3_severe 3_severe 3_severe 0_no
007 sea_water half pint daily 3_severe 3_severe 3_severe 3_severe 0_no
008 sea_water half pint daily 3_severe 3_severe 3_severe 3_severe 0_no
009 citrus two lemons and an orange daily 1_mild 1_mild 0_none 1_mild 0_no
010 citrus two lemons and an orange daily 0_none 0_none 0_none 0_none 1_yes
011 purgative_mixture a nutmeg-sized paste of garlic, mustard seed, horseradish, balsam of Peru, and gum myrrh three times a day 3_severe 3_severe 3_severe 3_severe 0_no
012 purgative_mixture a nutmeg-sized paste of garlic, mustard seed, horseradish, balsam of Peru, and gum myrrh three times a day 3_severe 3_severe 3_severe 3_severe 0_no

A Treatise on the Scurvy in Three Parts. James Lind. 1757.

05:00

Discussion

If you want to access the data:

R

library(medicaldata)
data("scurvy")

Python

import pandas as pd
url = 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-07-25/scurvy.csv'
scurvy = pd.read_csv(url)

Discussion

Nicola Rennie (nrennie.rbind.io)

Scurvy chart example

Georgios Karamanis (karaman.is)

Split tile chart of scurvy data

Read the guide: rss.org.uk/datavisguide

Questions?