About me

Data visualisation specialist, mainly using R, Python, and D3.


Background in statistics, operational research, and data science.


Developer of several R packages, mainly for visualisation.


Author of The Art of Visualization with ggplot2.

Cover of The Art of Visualization with ggplot2

What’s this session about?

Aims:

  • Walk through the process of creating a chart with ggplot2
  • A little bit of data visualisation design theory
  • R tips and tricks to improve your charts
  • Show the process to just the code and chart

During this session:

  • Code-along and make a chart!
  • Sit back and relax!
  • Ask lots of questions!

TidyTuesday

  • TidyTuesday is a weekly social data project
  • New dataset uploaded every week
  • Make a chart/dashboard/model/…
  • Share your output and code

TidyTuesday logo

Packages

library(camcorder)
library(dplyr)
library(ggplot2)
library(ggtext)
library(glue)
library(showtext)
library(tidyr)
library(tidytuesdayR)


Session will be demonstrated in RStudio.

TidyTuesday repository

Each week you can access the data (and a README file) on GitHub.

We’ll be using a dataset from 2023 for these examples.

TidyTuesday repository screenshot

Live demo

Load data using the tidytuesdayR package

Code
tuesdata <- tt_load("2023-01-31")
cats <- tuesdata$cats_uk
cats_reference <- tuesdata$cats_uk_reference

What question are we trying to answer?

  • Where do cats go?
  • How fast do cats run?
  • How long do they spend indoors?
  • Does that change as they get older?

Photo of a black cat looking up off camera

Live demo

Where do cats go?

Code
plot(
  x = cats$location_long,
  y = cats$location_lat,
  xlab = "Longitude",
  ylab = "Latitude"
)

How fast do cats run?

Code
hist(
  x = cats$ground_speed,
  xlab = "Ground speed (m/s)",
  main = "Histogram of ground speed"
)

How long do they spend indoors?

Code
barplot(table(cats_reference$hrs_indoors))

Does that change as they get older?

Code
plot(
  x = cats_reference$age_years,
  y = cats_reference$hrs_indoors,
  xlab = "Age",
  ylab = "Hours indoors"
)

Do older cats spend more time indoors?

Sketching ideas

Don’t jump straight into writing code for the final chart yet!

  • Sketching is an important part of the design process.
  • Explore visualizations (in the same way you explore the data before you decide how to process it).
  • Refine your design ideas.
  • Throw it away if it doesn’t work.

Initial sketch

Initial sketch of chart

Add some detail to the sketch

Sketch of chart

What data do we need?

  • The number of cats in each unique combination of hrs_indoors and age_years.

  • We only need to use the cats_reference data.

  • Is all of the data in the format that we expect?

Live demo

Data wrangling

Code
plot_data <- cats_reference |>
  select(age_years, hrs_indoors) |>
  mutate(hrs_indoors = factor(hrs_indoors)) |>
  count(age_years, hrs_indoors) |>
  drop_na()

Initial draft of a plot

Code
basic_plot <- ggplot(
  data = plot_data,
  mapping = aes(
    x = age_years,
    y = hrs_indoors,
    size = n
  )
) +
  geom_point()
basic_plot

Live demo

Before we go any further!

Have you ever spent ages tinkering with a plot you’re previewing in RStudio…

Stacked bar chart

Before we go any further!

…and then used ggsave() to save an image, and ended up with something like this:

Stacked bar chart not looking very nice

Why does this happen?

  • This happens because we preview the chart at 96dpi (low resolution) in RStudio, but save the chart at 300dpi (high resolution) using the defaults for ggsave().

  • We could set ggsave(dpi = 96) but we usually don’t want to save a low resolution image.

  • Instead we want to preview the chart in RStudio at the higher resolution we’ll be saving it at.

How do we fix this?

camcorder: github.com/thebioengineer/camcorder

  • Automatically saves an image every time you print a plot
  • Make a .gif using those image files
  • Preview the ggplot2 output directly with your specifications in RStudio IDE - you’ll get what you see

camcorder hex sticker

How do we fix this?

Gif showing the process of making a chart

How do we fix this?

ggview: github.com/idmn/ggview

  • Preview the ggplot2 output directly with your specifications in RStudio IDE - you’ll get what you see

ggview hex sticker

Live demo

Set up camcorder

Code
library(camcorder)
gg_record(
  width = 6,
  height = 4
)

What do we want to change about this plot?

Colours

How do we choose good colours?

Three colour palette examples

Live demo

Create variable for colours

Code
text_col <- "#152826"
highlight_col <- "#914D76"
bg_col <- "white"

Update basic chart

Code
basic_plot <- ggplot(
  data = plot_data,
  mapping = aes(
    x = age_years,
    y = hrs_indoors,
    size = n
  )
) +
  geom_point(color = highlight_col) +
  scale_size(breaks = c(3, 6, 9))

Fonts

  • Fonts in R are often cited as being one of the most tricky aspects of visualisation.
  • There are lots of different packages, which one should you use?
    • Local font files -> ragg and systemfonts
    • Google fonts -> showtext

Live demo

Loading fonts

Code
font_add_google(name = "Chewy")
font_add_google(name = "Ubuntu")
showtext_auto()
showtext_opts(dpi = 300)
title_font <- "Chewy"
body_font <- "Ubuntu"

Data-driven annotations

We could type out text for the annotations such as:

"The value of x at time 25 is 123,765."

But this means that:

  • You have to manually look up the values
  • There’s a risk of making a typo
  • You have to remember to update it if the data changes

Data-driven annotations

There are different ways to combine variables with text in R.

  • paste() / paste0()
  • sprintf()
  • glue()

Glue logo

Live demo

Making data-driven text with glue()

Code
annot_oldest <- cats_reference |>
  slice_max(age_years) |> 
  mutate(label = glue("The oldest cat is {animal_id} who is {age_years} years old.")) |> 
  select(hrs_indoors, age_years, label)

Adding annotation to chart

Code
annotated_plot <- basic_plot +
  geom_textbox(
    data = annot_oldest,
    mapping = aes(
      x = age_years - 2.5,
      y = factor(hrs_indoors),
      label = label
    ),
    halign = 0.5,
    hjust = 0.5,
    size = 2.5,
    lineheight = 0.5,
    family = body_font,
    box.color = text_col,
    color = text_col,
    maxwidth = unit(1, "in")
  )

Additional text

There are several arguments in labs() that we can use to add text to our chart:

  • title
  • subtitle
  • caption
  • tag
  • x/y axis labels
  • legend titles

Live demo

Create text variables

Code
title <- "Do older cats spend more time indoors?"
perc_indoor <- round(100 * sum(cats_reference$hrs_indoors == "22.5") / nrow(cats_reference))
st <- glue("Around {perc_indoor}% of cats in the study spend on average 22.5 hours per day indoors! There is a slight trend for cats to spend more time indoors as they age.")
cap <- "Data: McDonald JL, Cole H. 2020 | Graphic: Nicola Rennie"

Add text to chart

Code
text_plot <- annotated_plot +
  labs(
    title = title,
    subtitle = st,
    caption = cap,
    x = "Age of cat (years)",
    y = "Average time spent indoors (hours per day)",
    size = "Number of cats"
  )

Final adjustments

  • What about the fonts we chose?

  • Where’s the subtitle text gone?!

  • The legend takes up a lot of space

  • That grey background…

Live demo

Apply default font and size

Code
theme_plot1 <- text_plot +
  theme_minimal(
    base_family = body_font,
    base_size = 10
  )

Further styling

Code
theme_plot2 <- theme_plot1 +
  theme(
    # legend styling
    legend.position = "inside",
    legend.position.inside = c(0.9, 0.25),
    legend.background = element_rect(fill = alpha(bg_col, 0.6), color = text_col),
    # text
    text = element_text(color = text_col),
    plot.title = element_text(family = title_font, face = "bold", size = rel(1.5)),
    plot.subtitle = element_textbox_simple(),
    plot.caption = element_textbox_simple(),
    plot.title.position = "plot",
    plot.caption.position = "plot",
    # background and grid
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = bg_col, color = bg_col)
  )

Save as an image

Code
ggsave(
  theme_plot2,
  filename = "cats.png",
  width = 6,
  height = 4
)

Reflection

Reflection

  • The legend box and the annotation box currently look quite different.

  • If we had joined the GPS data, we could have perhaps made a more informative plot showing the relationship between age and activity level.

Want to keep going?

  • Add more annotations and arrows pointing to the relevant data point.

  • Join the cats and the cats_reference datasets using the tag_id column.

  • Instead of plotting the time spent indoors on the y-axis, plot the cat’s average ground speed (ignoring that the values are very unusual!)

  • Add another annotation that highlights the cat with the highest average speed.

Resources