Centre for Marketing Analytics and Forecasting: Friday Forecasting Talks
15 November 2024
Lecturer in Health Data Science within the Centre for Health Informatics, Computing, and Statistics.
Background in statistics, operational research (transport), and data science consultancy.
Collaborate with local NHS trusts on data science projects.
Co-author of Royal Statistical Society’s Best Practices for Data Visualisation guidance.
How did the Royal Statistical Society’s Best Practices for Data Visualisation guide start?
A survey in 2021 asked Royal Statistical Society (RSS) members their views on Significance magazine.
Respondents were asked, “What aspects of content could be improved?”
“RSS publications seek data visualisation expert to develop best practice guidance.”
Andreas Krause, Idorsia Pharmaceuticals
Nicola Rennie, Lancaster University
Brian Tarran, Royal Statistical Society
The guide would:
In this session we will cover…
why you should visualise data;
some guidelines for making better charts;
examples of good and bad charts!
Data visualisation has two main purposes:
Grab attention
Visualisations stand out. If a reader is short on time or uncertain about whether a document is of interest, an attention-grabbing visualisation may entice them to start reading.
Improve access to information
Textual descriptions can be lengthy and hard to read, and are frequently less precise than a visual depiction showing data points and axes.
Summarise content
Visual displays allow for summarising complex textual content, aiding the reader in memorising key points.
Data visualisations must serve a purpose.
Ask yourself:
“All variations [of pie charts] lead to overestimation of small values and underestimation of large ones.” Kesara and Skau (2016)
On the plot on the left, how tall is the bar?
Longer labels are best on the y-axis, horizontally.
Box plots hide information.
Should the axes start at 0?
They don’t always have to start at zero…
Source: Georgia Department of Public Health
Order categories appropriately…
Default:
Magnitude ordered:
Naturally ordered:
Avoid spaghetti plots!
Alternatives to spaghetti:
Why use colours in data visualisation?
Colours should serve a purpose, e.g. discerning groups of data.
Colours can highlight or emphasise parts of your data.
Not always the most effective for, e.g. communicating differences between variables.
Different types of colour palettes…
… for different types of data.
Is this a good choice of colour?
Check for colourblind friendly plots with colorblindr::cvd_grid(g)
.
Charts should have a purpose
Actively design visualisations
Default settings aren’t always the best choices
Every rule should be broken for some visualisations
Quarto is an open-source scientific and technical publishing system that allows you to combine text, images, code, plots, and tables in a fully-reproducible document.
Quarto has support for multiple languages including R, Python, Julia, and Observable. It works for a range of output formats such as PDFs, HTML documents, websites, presentations,…
The source code for the guide is stored on GitHub.
If you want to contribute to the guide, the easiest way is via GitHub.
GitHub link: github.com/royal-statistical-society/datavisguide
Contributor guide: royal-statistical-society.github.io/datavisguide/howto.html#how-to-contribute-to-this-guide
Add issues or contribute to discussions about the guide.