{ggflowchart}

The story of Goldilocks and the three geoms

About me

Lecturer in Health Data Science at Lancaster University.


Academic background in statistics, and experience in data science consultancy.


Blog about R and data science at nrennie.rbind.io/blog.

photo of lancaster castle and canal

What is {ggflowchart}?

An R package to create flowcharts using {ggplot2}.


Install the package from CRAN using:

install.packages("ggflowchart")

or install the development version from GitHub:

remotes::install_github("nrennie/ggflowchart")

How {ggflowchart} started

It started with the #30DayChartChallenge…

30daychartchallenge prompts from 2022

Prompt: Storytelling


Category: Uncertainty

Goldilocks data

Goldilocks and the three bears illustration

goldilocks <- tibble::tibble(
  from = c(
    "Goldilocks", "Porridge", "Porridge", "Porridge",
    "Just right", "Chairs", "Chairs", "Chairs",
    "Just right2", "Beds", "Beds", "Beds",
    "Just right3"
  ),
  to = c(
    "Porridge", "Too cold", "Too hot", "Just right",
    "Chairs", "Still too big", "Too big", "Just right2",
    "Beds", "Too soft", "Too hard", "Just right3",
    "Bears!"
  )
)

Other packages and tools for flowcharts

In R:

  • {grid}
  • {igraph}
  • {ggnetwork}
  • {ggnet2}
  • {ggraph}
  • {DiagrammeR}

In Quarto:

  • Mermaid
  • GraphViz
  • tikz

It’s just rectangles, text, and lines…

geom_rect()

Flowchart nodes

geom_text()

Flowchart nodes with text

geom_path()

Flowchart nodes with text and lines

The Goldilocks Decision Tree

flowchart example

…it also inspired the hex sticker!

ggflowchart hex sticker logo

Building {ggflowchart}

Aims

  • Simple wrapper function to create a flowchart from a data.frame

  • Users don’t have to define node position

  • Works with existing {ggplot2} styling options

A simple example

library(ggflowchart)
flow_data <- tibble::tribble(
  ~from, ~to,
  "A", "B",
  "A", "C",
  "A", "D",
  "B", "E",
  "C", "F",
  "F", "G"
)
ggflowchart(flow_data)

How {ggflowchart} works

Layouts with {igraph}

get_layout <- function(data, layout = "tree", node_data = NULL) {
  
  # some argument tests are in here ...

  if (layout == "tree") {
    data <- dplyr::select(data, c(.data$from, .data$to))
    g <- igraph::graph_from_data_frame(data, directed = TRUE)
    coords <- igraph::layout_as_tree(g)
    colnames(coords) <- c("x", "y")
    output <- tibble::as_tibble(coords) %>%
      dplyr::mutate(name = igraph::vertex_attr(g, "name"))
  } else if (layout == "custom") {
    
    # some argument tests are in here ...

    output <- node_data %>%
      dplyr::select(c(.data$x, .data$y, .data$name))
  }
  return(output)
}

Adding attributes with node_data

The data argument of ggflowchart() passes information about the edges. The optional node_data argument passes information about the nodes.


There are some special column names in node_data:

  • name (required)
  • x_nudge and y_nudge (optional to change width or height of individual nodes)
  • x and y (optional if using layout = "custom" to set node positions)

Adding attributes with node_data

node_layout <- get_layout(
    data = data,
    layout = layout,
    node_data = node_data
    )

add_node_attr <- function(node_layout, node_data) {
  
  # some argument tests are in here ...
  
  node_layout <- dplyr::left_join(node_layout, node_data, by = "name")
  return(node_layout)
}

Adding attributes with node_data

flow_data <- tibble::tribble(
  ~from, ~to,
  "A", "B",
  "B", "C"
)
node_data <- tibble::tribble(
  ~name, ~type, ~x_nudge,
  "A", "Yes", 0.3,
  "B", "Yes", 0.4,
  "C", "No", 0.5
)
ggflowchart(flow_data, node_data, fill = type)

Plotting with {ggplot2}

ggplot() +
  geom_rect(...) +
  geom_text(...) +
  geom_path(...) +
  theme_void(...)

ggplot2 hex sticker

What’s next for {ggflowchart}?

More (and better) examples in vignettes

screenshot of github issue for documentation

Adding circular nodes

screenshot of github issue for circular nodes

Adding geoms

It started out with two arguments…

ggflowchart(data, node_data = NULL)

Then the feature requests came…

ggflowchart(
  data,
  node_data = NULL,
  layout = "tree",
  fill = "white",
  colour = "black",
  alpha = 1,
  text_colour = "black",
  text_size = 3.88,
  parse = FALSE,
  arrow_colour = "black",
  arrow_size = 0.3,
  arrow_linewidth = 0.5,
  arrow_linetype = "solid",
  arrow_label_fill = "white",
  family = "sans",
  x_nudge = 0.35,
  y_nudge = 0.25,
  horizontal = FALSE,
  color = NULL,
  text_color = NULL,
  arrow_color = NULL
)

Adding geoms

Monolithic functions aren’t great…

Aim:

  • geom_nodes(): adds the rectangles and text

  • geom_edges(): adds the edges and edge labels

Adding geoms

Issues to solve:

  • colour mapping for geom_nodes(): rectangle outlines or text?

  • custom scale_colour_* functions? or suggest {ggnewscale}?

Contact