The Art of Data Visualization with ggplot2

Author

Nicola Rennie

Preface

Welcome to the online version of The Art of Data Visualization with ggplot2 also known as The TidyTuesday Cookbook by Nicola Rennie.

This book is currently a work-in-progress. This book will be published by CRC Press.

Welcome to The Art of Data Visualization with ggplot2 (also known as The TidyTuesday Cookbook). TidyTuesday is a weekly social data project which aims to make learning to work with data easier, by providing real-world datasets. Participants are encouraged to explore the data shared via GitHub each week, create an output such as a data visualization, and share their output alongside their code, with the community.

After three years of weekly contributions, I’ve worked with around 150 datasets and created over 150 data visualizations. Each chapter of this book will cover a different data visualization, showing: the data exploration process; the choice of data visualization type; the initial design ideas with hand-drawn sketches; the first build of a plot; and the iterative process of styling plots. For each plot, full R code is provided and explained for each step of the creative process. None of the visualizations you’ll see were created specifically for this book - each one is an original data visualization created using real TidyTuesday datasets over a series of years. Think of each chapter as a case study starting with a new data set you’ve never seen before, and working through a process to get to an insightful, artistic visualization. This book is all about that process. Sometimes the data is messy. Sometimes the code is hacky. Sometimes, upon reflection, the data could be visualized better.

Data visualization can be a very effective and efficient means of communicating information. Visualizing your data typically serves one of two purposes: (i) as part of exploratory analysis to help uncover discrepancies in data and identify interesting relationships to measure; or (ii) to communicate key insights and messages to a broader audience. The case-study nature of this book means that we’ll talk about both of these aspects, though we ’ll focus mostly on the second. Choosing an appropriate type of visualization and making careful choices about design can clarify the message you are trying to convey to a reader. That does not necessarily mean that every chart must follow a set of rules and stick to a rigid format. Instead, data visualization is a blend of science and creativity - many of the key landmark data visualizations held up as excellent examples don’t fit into the standard categories of bar charts, scatter plots, or line graphs.

That being said, the visualizations in this book are not necessarily always the most effective choice of visualization for the data and relationship shown. Rather, this book aims to show you examples of the end-to-end process of creating data visualizations, with a focus on the technical details of building them in R. You’ll see some non-standard solutions and unusual ideas that you can use to transform your data visualizations.

Acknowledgements

This book would never have been possible if I hadn’t stumbled upon TidyTuesday several years ago, and so I’m exceptionally grateful for the team behind it. In particular, Jon Harmon who has spent much of his own free time building a wonderful community of data science folks, and maintaining the TidyTuesday datasets alongside other open source projects. Special thanks also to Tom Mock, Tracy Teal, Lydia Gibson, and Tan Ho for their work in supporting TidyTuesday and the wider Data Science Learning Community. Many thanks to those who have curated and submitted datasets for TidyTuesday over the years.

More broadly, the R and Data Visualization communities have been a source of inspiration, support, and odds bits of knowledge both before and during the writing of this book. I’m sure they will continue to be afterwards.

Thanks also to Emanuele Giorgi and Claudio Fronterrè for their help in reviewing early drafts, and for their encouragement throughout the process of writing this book. Thank you to CRC Press for agreeing to publish this book, and special thanks to Lara Spieker for her guidance and support.

Basically, in short, I’m very grateful for the big, nerdy data community around me.