The story of Goldilocks and the three geoms
Lecturer in Health Data Science at Lancaster University.
Academic background in statistics, and experience in data science consultancy.
Blog about R and data science at nrennie.rbind.io/blog.
An R package to create flowcharts using {ggplot2}.
Prompt: Storytelling
Category: Uncertainty
goldilocks <- tibble::tibble(
from = c(
"Goldilocks", "Porridge", "Porridge", "Porridge",
"Just right", "Chairs", "Chairs", "Chairs",
"Just right2", "Beds", "Beds", "Beds",
"Just right3"
),
to = c(
"Porridge", "Too cold", "Too hot", "Just right",
"Chairs", "Still too big", "Too big", "Just right2",
"Beds", "Too soft", "Too hard", "Just right3",
"Bears!"
)
)
In R:
In Quarto:
geom_rect()
geom_text()
geom_path()
Aims
Simple wrapper function to create a flowchart from a data.frame
Users don’t have to define node position
Works with existing {ggplot2} styling options
get_layout <- function(data, layout = "tree", node_data = NULL) {
# some argument tests are in here ...
if (layout == "tree") {
data <- dplyr::select(data, c(.data$from, .data$to))
g <- igraph::graph_from_data_frame(data, directed = TRUE)
coords <- igraph::layout_as_tree(g)
colnames(coords) <- c("x", "y")
output <- tibble::as_tibble(coords) %>%
dplyr::mutate(name = igraph::vertex_attr(g, "name"))
} else if (layout == "custom") {
# some argument tests are in here ...
output <- node_data %>%
dplyr::select(c(.data$x, .data$y, .data$name))
}
return(output)
}
node_data
The data
argument of ggflowchart()
passes information about the edges. The optional node_data
argument passes information about the nodes.
There are some special column names in node_data
:
name
(required)x_nudge
and y_nudge
(optional to change width or height of individual nodes)x
and y
(optional if using layout = "custom"
to set node positions)node_data
node_data
Then the feature requests came…
ggflowchart(
data,
node_data = NULL,
layout = "tree",
fill = "white",
colour = "black",
alpha = 1,
text_colour = "black",
text_size = 3.88,
parse = FALSE,
arrow_colour = "black",
arrow_size = 0.3,
arrow_linewidth = 0.5,
arrow_linetype = "solid",
arrow_label_fill = "white",
family = "sans",
x_nudge = 0.35,
y_nudge = 0.25,
horizontal = FALSE,
color = NULL,
text_color = NULL,
arrow_color = NULL
)
Monolithic functions aren’t great…
Aim:
geom_nodes()
: adds the rectangles and text
geom_edges()
: adds the edges and edge labels
Issues to solve:
colour mapping for geom_nodes()
: rectangle outlines or text?
custom scale_colour_*
functions? or suggest {ggnewscale}?