Leeds Institute for Data Analytics
18 April 2024
Lecturer in Health Data Science within the Centre for Health Informatics, Computing, and Statistics.
Background in statistics, operational research (transport), and data science consultancy.
Collaborate with local NHS trusts on data science projects.
Co-author of Royal Statistical Society’s Best Practices for Data Visualisation guidance.
How did the Royal Statistical Society’s Best Practices for Data Visualisation guide start?
A survey in 2021 asked Royal Statistical Society (RSS) members their views on Significance magazine.
Respondents were asked, “What aspects of content could be improved?”
“RSS publications seek data visualisation expert to develop best practice guidance.”
Andreas Krause, Idorsia Pharmaceuticals
Nicola Rennie, Lancaster University
Brian Tarran, Royal Statistical Society
The guide would:
In this session we will cover…
why you should visualise data;
some guidelines for making better charts;
examples of good and bad charts!
Data visualisation has two main purposes:
Grab attention
Visualisations stand out. If a reader is short on time or uncertain about whether a document is of interest, an attention-grabbing visualisation may entice them to start reading.
Improve access to information
Textual descriptions can be lengthy and hard to read, and are frequently less precise than a visual depiction showing data points and axes.
Summarise content
Visual displays allow for summarising complex textual content, aiding the reader in memorising key points.
John Snow collected data on cholera deaths and created a visualisation where the number of deaths was represented by the height of a bar at the corresponding address in London.
This visualisation showed that the deaths clustered around Broad Street, which helped identify the cause of the cholera transmission, the Broad Street water pump.
Snow. 1854.
“All variations [of pie charts] lead to overestimation of small values and underestimation of large ones.” Kesara and Skau (2016)
On the plot on the left, how tall is the bar?
Data visualisations must serve a purpose.
Ask yourself:
Longer labels are best on the y-axis, horizontally.
Should the axes start at 0?
They don’t always have to start at zero…
Order categories appropriately…
Order based on magnitude unless the category order has meaning…
Source: Georgia Department of Public Health
or use coloured fonts in the subtitle…
Why use colours in data visualisation?
Colours should serve a purpose, e.g. discerning groups of data
Colours can highlight or emphasise parts of your data.
Not always the most effective for, e.g. communicating differences between variables.
Different types of colour palettes…
… for different types of data.
Is this a good choice of colour?
Check for colourblind friendly plots with colorblindr::cvd_grid(g)
.
Font size: larger fonts are (usually) better
Font colour: ensure sufficient contrast
Font face: highlight text using bold font, avoid italics
Font family: choose a clear font with distinguishable features (pick something familiar)
There is no perfect font.
Charts should have a purpose
Actively design visualisations
Default settings aren’t always the best choices
In groups, discuss how you might visualise the following data?
study_id | treatment | dosing_regimen_for_scurvy | gum_rot_d6 | skin_sores_d6 | weakness_of_the_knees_d6 | lassitude_d6 | fit_for_duty_d6 |
---|---|---|---|---|---|---|---|
001 | cider | 1 quart per day | 2_moderate | 2_moderate | 2_moderate | 2_moderate | 0_no |
002 | cider | 1 quart per day | 2_moderate | 1_mild | 2_moderate | 3_severe | 0_no |
003 | dilute_sulfuric_acid | 25 drops of elixir of vitriol, three times a day | 1_mild | 3_severe | 3_severe | 3_severe | 0_no |
004 | dilute_sulfuric_acid | 25 drops of elixir of vitriol, three times a day | 2_moderate | 3_severe | 3_severe | 3_severe | 0_no |
005 | vinegar | two spoonfuls, three times daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
006 | vinegar | two spoonfuls, three times daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
007 | sea_water | half pint daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
008 | sea_water | half pint daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
009 | citrus | two lemons and an orange daily | 1_mild | 1_mild | 0_none | 1_mild | 0_no |
010 | citrus | two lemons and an orange daily | 0_none | 0_none | 0_none | 0_none | 1_yes |
011 | purgative_mixture | a nutmeg-sized paste of garlic, mustard seed, horseradish, balsam of Peru, and gum myrrh three times a day | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
012 | purgative_mixture | a nutmeg-sized paste of garlic, mustard seed, horseradish, balsam of Peru, and gum myrrh three times a day | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
A Treatise on the Scurvy in Three Parts. James Lind. 1757.
03:00
Quarto is an open-source scientific and technical publishing system that allows you to combine text, images, code, plots, and tables in a fully-reproducible document.
Quarto has support for multiple languages including R, Python, Julia, and Observable. It works for a range of output formats such as PDFs, HTML documents, websites, presentations,…
The source code for the guide is stored on GitHub.
If you want to contribute to the guide, the easiest way is via GitHub.