Statistic | Value |
---|---|
Mean(x) | 54.26 |
Mean(y) | 47.83 |
Standard deviation(x) | 16.77 |
Standard deviation(y) | 26.94 |
Correlation(x, y) | -0.06 |
Data visualisation specialist.
Mainly working with R, Python, and D3.
Background in statistics, operational research, and data science consultancy.
Combines slides, examples, and discussions for you to participate in.
Ask questions throughout!
I hope you end up with more questions than answers after this workshop!
Course website: nrennie.rbind.io/training-data-visualisation
In this session we will cover…
why you should visualise data;
choosing a chart type;
some guidelines for making better charts;
examples of good and bad charts!
Data visualisation has two main purposes:
Statistic | Value |
---|---|
Mean(x) | 54.26 |
Mean(y) | 47.83 |
Standard deviation(x) | 16.77 |
Standard deviation(y) | 26.94 |
Correlation(x, y) | -0.06 |
Grab attention
Visualisations stand out. If a reader is short on time or uncertain about whether a document is of interest, an attention-grabbing visualisation may entice them to start reading.
Improve access to information
Textual descriptions can be lengthy and hard to read, and are frequently less precise than a visual depiction showing data points and axes.
Summarise content
Visual displays allow for summarising complex textual content, aiding the reader in memorising key points.
John Snow collected data on cholera deaths and created a visualisation where the number of deaths was represented by the height of a bar at the corresponding address in London.
This visualisation showed that the deaths clustered around Broad Street, which helped identify the cause of the cholera transmission, the Broad Street water pump.
Snow. 1854.
Data visualisations must serve a purpose.
Ask yourself:
Correlation: The relationship between two variables.
Deviation: The difference between a value and an average or another value.
Distribution: How data values are spread for a variable.
Geography: The pattern of data across different locations or areas.
Magnitude: The size of values.
Parts of a whole: The relative sizes of components within a whole.
Ranking: The position of data within a hierarchy or scale.
Time: How a value changes over time.
On the plot on the left, how tall is the bar?
Longer labels are best on the y-axis, horizontally.
Should the axes start at 0?
They don’t always have to start at zero…
Order categories appropriately…
Source: Georgia Department of Public Health
Default:
Magnitude ordered:
Naturally ordered:
Avoid spaghetti plots!
Alternatives to spaghetti:
Some alternatives:
Why use colours in data visualisation?
Colours should serve a purpose, e.g. discerning groups of data
Colours can highlight or emphasise parts of your data.
Not always the most effective for, e.g. communicating differences between variables.
Different types of colour palettes…
… for different types of data.
Example: red and blue used to show hot and cold
Tip: never switch to the opposite meaning!
Example: pink and blue used to show women and men
Tip: think about colour associations.
05:00
In groups, discuss the following chart. What is good and bad about it?
Source: commonslibrary.parliament.uk/general-election-2019-how-many-women-were-elected available under Open Parliament Licence.
Charts should have a purpose, and the chart type should support that purpose.
Actively design visualisations with your audience in mind.
Every rule should be broken for some visualisations.
Course website: nrennie.rbind.io/training-data-visualisation