1 Introduction

1.1 Who is this book aimed at?

This book is primarily aimed at those who wish to develop their data visualization skills in R. Readers of this book may find a basic knowledge of R, more specifically of the tidyverse ecosystem, useful - although all code used in examples is fully explained. Readers do not need to be experienced in ggplot2, though this book will also be of interest to those who are. This book will also be of interest to those who are already familiar with R (including ggplot2), and wish to develop their skills in designing data visualizations further. It will also interest those who already design data visualizations using other tools, and want to learn how to do the equivalent in R.

1.2 What do you need to know before reading this book?

This book assumes you know the basics of using R and know what the tidyverse is. For example, you’ll be able to:

install and load packages
call a function and save the output to variables

If you’re not that familiar with R, any of the following books should provide some good pre-reading:

R for Data Science (Second Edition) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. It can be found online at r4ds.hadley.nz (Wickham, Çetinkaya-Rundel, and Grolemund 2023).
R for the Rest of Us: A Statistics-Free Introduction by David Keyes. It can be found online at book.rfortherestofus.com (Keyes 2024).
Learn R: As a Language by . It can be found online at www.learnr-book.info (Aphalo 2024).

1.3 Code style used in this book

There are many different versions of R, many different packages in R, and many different styles of writing code - and there isn’t a single best way of doing things. You may write code differently from the way you see it written in this book, and that’s perfectly fine. Throughout the book, a few choices have been made to keep code consistent:

In R, the pipe operator takes the thing on its left and passes it along to the function on its right (Wickham, Çetinkaya-Rundel, and Grolemund 2023). You can find a full description of the pipe operator in R for Data Science. The pipe (%>%) was first introduced to R via the magrittr package. Since version 4.1.0 of R, a version of the pipe (|>) has existed in base R. The base R version of the pipe is used throughout the book. Although there are some differences between the two version of the pipe, in this book they can be used interchangeably.

In this book, we’ll primarily use base R for initial exploration and visualization, and then ggplot2 and it’s associated extension packages to create our final graphic in each chapter.
In R, functions can be loaded from packages in different ways: through loading the entire package via the library() function; using namespacing (the pre-fixing of functions with the package name and ::); or (since R version 4.5.0) through using the use() function. Namespacing is useful for two reasons (i) from a learning perspective, it makes it easier to recognize where functions come from and how they connect together, and (ii) from a programming perspective, it reduces conflicts and errors - something we all want less of! However, to make this book more accessible to beginners and to ensure code works with earlier versions of R, we’ll be using library() throughout. Each chapter begins with a section discussing which packages will be used, and where functions come from is explained throughout.

Package names are identified in a bold, monospaced typeface to help differentiate them from functions with the same or similar names. For example ggplot2.
All software requirements, including a complete list of package versions, can be found in the Appendix.

1.4 Code evolves over time

You may also notice that some of the final images differ slightly from those initially created and published on social media. You might also find some small differences in the code used to produce them if you compare the contents of this book to original scripts in the github.com/nrennie/tidytuesday GitHub repository. These differences are likely due to one of four reasons:

Packages have since been updated and code has been changed to use newer syntax. Many of the code changes relate to changes in ggplot2 version 3.5.0;
Some aspects have been omitted from a visualization to avoid explaining everything in the first chapter - but those aspects are all covered and linked to in later chapters;
After several years of practice, there may be more efficient ways of re-writing code from some of the earlier plots. Any changes are clearly labelled and discussed;
Some images may be different due to copyright reasons.

1.5 The structure of this book

There are four main sections in this book:

Common charts don’t need to be boring!: which teaches you how to make classic chart types such as lines charts, and bar charts more effective and more interesting.
Making use of icons, fonts, and text: where you’ll see different ways to load fonts into R, be able to use icons within charts, and use colored text as an alternative to a traditional legend.
Working with images: where you’ll see examples of loading and processing images in R, and learn how to add them to plots to create custom labels.
Visualizing spatial data: where you’ll learn how to manipulate spatial data, create choropleth maps, coordinate plots, and arranging small multiples in a geographic grid.

Each chapter is a case study of a different visualization, which follows roughly the same structure:

Data: an introduction to the data set used in each chapter and how to load it into R. Across the chapters, you’ll see ways of loading data via R packages, local CSV files, APIs, and directly from URLs.
Exploratory work: exploring the structure of the dataset, identifying issues, and considering potential approaches to visualizaton.
Preparing a plot: performing the data wrangling needed to make the plot, and creating a first draft using basic ggplot2 functionality.
Advanced styling: editing the basic plot to make it of publication quality with custom styling, including fonts, colors, text, and legends.
Reflection: some thoughts on how the visualization created in each chapter may be improved, and what aspects of its design are successful.
Exercises: a few questions for following up on improving the visualizations. These exercises are purposefully left open-ended, rather than prescriptive questions with defined answers. You’re encouraged to think about how you would design and implement different solutions - sharing them on social media is optional!

There is an additional chapter at the end with some further tips and tricks for improving your plots.