Common relationships

  • Magnitude: The size of values.

  • Distribution: How data values are spread for a variable.

  • Ranking: The position of data within a hierarchy or scale.

  • Deviation: The difference between a value and an average or another value.

  • Parts of a whole: The relative sizes of components within a whole.

  • Correlation: The relationship between two variables.

  • Time: How a value changes over time.

  • Geography: The pattern of data across different locations or areas.

Visual vocabulary

Screenshot of FT chart type poster

It’s not just about the type of data…

Avoid spaghetti plots!

Effectively plotting multiple values

Alternatives to spaghetti:

  • Show a smaller number of lines (e.g. compare a few countries to average)
  • Use colour only to highlight lines
  • Use facets (AKA small multiples)

Small mutliples allow clearer comparison

Plotting variables on different scales

Effectively plotting on different scales

Some alternatives:

  • Separate plots, each with their own axis, and place the plots side-by-side.
  • Plot different variables on the x- and y- axis.
  • Rescale the variables, rather than the axis.

Set up

# Load packages
import pandas as pd
import plotnine as gg
import matplotlib.pyplot as plt

# Load data
emissions = pd.read_csv('../data/emissions_income.csv')

# Prep data
income_order = pd.CategoricalDtype(
    categories=['Low-income countries', 'Lower-middle-income countries', 'Upper-middle-income countries', 'High-income countries'],
    ordered=True
)
emissions['Income'] = emissions['Income'].astype(income_order)

emissions.head()
Code Income Emissions
0 AFG Low-income countries 0.253848
1 ALB Upper-middle-income countries 1.591990
2 DZA Upper-middle-income countries 4.233817
3 AND High-income countries 5.181661
4 AGO Lower-middle-income countries 0.589497

Basic chart

p = (gg.ggplot())

Mapping data to chart

p = (gg.ggplot(emissions, gg.aes(x='Emissions', fill='Income')))

Adding geometry

p = (gg.ggplot(emissions, gg.aes(x='Emissions', fill='Income')) 
    + gg.geom_density(alpha=0.5))

Small multiples

p = (gg.ggplot(emissions, gg.aes(x='Emissions', fill='Income')) 
    + gg.geom_density(alpha=0.5)
    + gg.facet_wrap('Income'))

Arrange for comparison

p = (gg.ggplot(emissions, gg.aes(x='Emissions', fill='Income')) 
    + gg.geom_density(alpha=0.5)
    + gg.facet_wrap('Income', ncol = 1))

Styling

p = (gg.ggplot(emissions, gg.aes(x='Emissions', fill='Income')) 
    + gg.geom_density(alpha=0.5)
    + gg.facet_wrap('Income', ncol = 1) 
    + gg.labs(x = 'CO₂ emissions per capita', y = "")
    + gg.theme_minimal()
    + gg.theme(
      legend_position = "none",
      axis_text_y = gg.element_blank()
))

Change the chart type

p = (gg.ggplot(emissions, gg.aes(x='Emissions', fill='Income', y = 1)) 
    + gg.geom_point(alpha=0.5, size = 3)
    + gg.facet_wrap('Income', ncol = 1) 
    + gg.labs(x = 'CO₂ emissions per capita', y = "")
    + gg.theme_minimal()
    + gg.theme(
      legend_position = "none",
      axis_text_y = gg.element_blank()
))

Your turn!

You have been given some data on temperature anomalies and latitude.

  1. Create a chart that shows the trend over time for countries in South America.
  2. Create a chart that compares the temperature anomalies for countries in different world regions in 2025.
  3. Create a chart that compares the temperature anomalies for countries at different latitudes in 2025.

nrennie.rbind.io/MFC-CDT-data-viz

Discussion: temperature anomalies in different regions

Ridgeline chart

Discussion: temperature anomalies at different latitudes

Bubble chart