plot_data.head()
Practical Techniques for Polished Visuals with Plotnine
Nicola Rennie
PyData Global 2024
Academic background in statistics and operational research
Experience in data science consultancy
Lecturer in Health Data Science in Lancaster Medical School.
Interests: data visualisation, reproducible research, …
How I made this:
A brief introduction
A data visualisation library
Brings the Grammar of Graphics to Python
Implementation of ggplot2
from R
Build plots layer by layer, adding components like data points, lines, and labels to customise their visualizations.
Map columns of DataFrames to different components and properties e.g. colours.
High-level, declarative syntax, where you specify what you want to plot, rather than how to draw it.
Building an annotated area chart
Carbon Majors is a database of historical production data from 122 of the world’s largest oil, gas, coal, and cement producers. The Carbon Majors dataset is available for download and for non-commercial use, subject to InfluenceMap’s Terms and Conditions.
The dataset has 12,551 rows and 7 columns with the following variables: year
, parent_entity
, parent_type
, commodity
, production_value
, production_unit
, and total_emissions_MtCO2e
.
For this visualisation, we’ll look at total amount of coal produced per year since 1900, broken down by type of coal.
After some data wrangling, we have this data:
year commodity n
1900 Anthracite 1.379115
1900 Bituminous 11.312304
1900 Lignite 3.856433
1900 Metallurgical 1.573317
1900 Sub-Bituminous 2.110480
See nrennie.rbind.io/blog/plotnine-annotated-area-chart for data wrangling code.
Colours:
import textwrap
title_text='Coal production since 1900'
st='Carbon Majors is a database of historical production data from 122 of the world’s largest oil, gas, coal, and cement producers. This data is used to quantify the direct operational emissions and emissions from the combustion of marketed products that can be attributed to these entities.'
wrapped_subtitle='\n'.join(textwrap.wrap(st, width=50))
Import plotnine:
geoms
p = (p +
# Axis lines
gg.geom_segment(data=segment_data,
mapping=gg.aes(x='year', xend='year', y=0, yend=-1700),
linetype='dashed', alpha=0.4, color=text_col) +
# Axis labels
gg.geom_text(data=segment_data,
mapping=gg.aes(x='year', y=-1900, label='year'),
color=text_col, size=8, family=body_font, ha='left') +
gg.geom_text(data=y_axis_data,
mapping=gg.aes(x=2023, y='value', label='label'),
color=text_col, size=8, family=body_font, ha='left', va='top'))
p = (p +
gg.annotate(
'segment', x=exceeds100, xend=exceeds100, y=0, yend=5000,
size=1, color=text_col) +
gg.annotate(
'text', x=exceeds100 + 2, y=5000, label=exceeds100,
size=10, color=text_col, family=body_font,
ha='left', va='top', fontweight='bold') +
gg.annotate(
'text', x=exceeds100 + 2, y=5000 - 600,
label='Total coal production first\nexceeds 100 million tonnes\nper year.',
size=9, color=text_col, family=body_font, ha='left', va='top', ) +
gg.annotate(
'segment', x=1975, xend=1975, y=0, yend=10000,
size=1, color=text_col) +
gg.annotate(
'text', x=1975 + 2, y=10000, label='Coal types',
size=10, color=text_col,family=body_font,
ha='left', va='top', fontweight='bold'))
p = (p +
# Text for title and subtitle
gg.annotate(
'text', x=1900, y=11400,
label=title_text, color=text_col, family=body_font,
ha='left', va='top',
size=13, fontweight='bold'
) +
gg.annotate(
'text', x=1900, y=10500,
label=wrapped_subtitle, color=text_col, family=body_font,
ha='left', va='top', size=9.5
))
Using other libraries with plotnine
A library to make annotations easier in matplotlib.
A way to specify individual font properties for substrings of text.
Different colours, shading backgrounds, different font size, weights, or styles.
Documentation: pypi.org/project/highlight-text
# annotation labels
coal_types_label='Total coal production includes\nproduction of <Bituminous::{"color": "#E58606"}>,\n<Sub-bituminous::{"color": "#5D69B1"}>, <Metallurgical::{"color": "#52BCA3"}>,\n<Lignite::{"color": "#99C945"}>, <Anthracite::{"color": "#CC61B0"}>, and <Thermal::{"color": "#24796C"}>\ncoal. Bituminous accounts\nfor around half.'
# caption
cap='<Data::{"fontweight": "bold"}>: Carbon Majors\n<Graphic::{"fontweight": "bold"}>: Nicola Rennie (@nrennie)'
Plotnine allows you to build plots one layer at a time.
The syntax is fairly intuitive.
Customising plots takes a bit of work but it’s worth it.
Use libraries that work with matplotlib to gain extra features.
Plotnine documentation: plotnine.org
Plotnine gallery: plotnine.org/gallery
2024 plotnine contest: posit.co/blog/winner-of-the-2024-plotnine-plotting-contest