Writing Better R Code

Dr Nicola Rennie

Welcome!

Who am I?

Lecturer in Health Data Science within Centre for Health Informatics, Computing, and Statistics.


Academic background in statistics, with experience in data science consultancy and training.


Using R for over 10 years, and author of multiple R packages.

CHICAS logo

Workshop outline

  • 09:30 - 09:45: Welcome and set up
  • 09:45 - 10:00: Projects and relative file paths
  • 10:00 - 10:45: Organising and styling a script
  • 10:45 - 11:00: BREAK
  • 11:00 - 11:30: Multiple scripts and folders
  • 11:30 - 12:00: Other useful tips

What to expect during this workshop

  • Combines slides, live coding examples, and exercises for you to participate in.

  • Ask questions throughout!

  • I hope you end up with more questions than answers after this workshop!


Stranger Things questions gif

Source: giphy.com

What to expect during this workshop


What this workshop isn’t

This isn’t prescriptive. It’s suggestions of how to make your code clearer, and more reproducible.


I don’t always follow my own advice…

Workshop resources

Course website: nrennie.rbind.io/training-better-r-code

Screenshot of course website

What’s this session about?




A talk about things that would have made my life so much easier if I’d known them five years ago.

Who are you?

  • You’re (reasonably) comfortable writing R code to do some analysis

But…

  • Your scripts are getting longer and longer
  • There are R scripts everywhere
  • You don’t want other people to see the code you’ve written because it’s all a bit of a mess!

Why do we care?

  • Writing code that is readable and understandable is something that future you will be grateful for.

  • Writing code that is readable and understandable is something that other people will be grateful for.

    • Sharing code prevents duplicating work.
    • It makes work easier to replicate.
    • Some journals may require analysis code to be shared.

Projects and relative file paths

Problem: You store data in a different place

data <- read_csv("C:/Users/username/OneDrive/R code/LOS.csv")


Or maybe…

setwd("C:/Users/username/OneDrive/R code")

Where do we put files?

  • Let’s make our own lives easier by keeping our files organised!

  • Organised files, also makes it easier for R to find our (e.g. data) files.

  • An easy way to do this is using an R project

    • Use a new R project for each analysis project
    • Double click on the project file to open RStudio with your files in the right place.

R projects

R Projects are a special type of file with a .Rproj extension that makes it easier for you to keep all of the data, code, and images for a project in one place.

Open up RStudio, then click File –> New Project –> New DirectoryNew Project.

  • Type in the name that you want to call your new folder e.g. R Workshop. Then use Browse to select where on your computer you want to make the folder. IMPORTANT: remember where this is!

  • Finally, click Create Project. Your new folder will be created and opened in RStudio - sometimes it can take a couple of minutes.

Project folders

Use R Projects

Keep everything related to your analysis together, and easy for R to find by using projects.


R
│   messy_example_script.R
│   R scripts for other project 1.R
│   R scripts for other project 2.R
│   R scripts for other project 3.R
project folder
│   messy_example_script.R
|   data.csv
|   project_name.Rproj

Problem: Objects don’t seem to exist

Where do you write your code?

  • Console?

  • .R Script?

  • R Markdown file

Environments


Save your code

You should store the code that creates any object that exists in your Global Environment - otherwise your analysis isn’t reproducible!

Preventing bad habits

But it takes a long time to create an object…

  • Create a script that creates the object.

  • Save the object as an RDS file.

  • In further analysis, load the RDS object.

Live Demo!

  • Creating a project folder

  • Downloading data

  • Project settings

Exercise 1

  1. Create an R project for today’s workshop - name it something sensible!

  2. Download the messy_exercises_script.R script from nrennie.rbind.io/training-better-r-code/exercises.html and the hypoxia.csv data.

  3. Add those files to your R project in a sensible way.

  4. Edit the Global / Project options appropriately.

  5. Edit the script to use relative file paths.

10:00

Organising a script

Adding comments

  • Add comments using a # in R (in a separate line)

  • Comments don’t need to explain what your code does.

  • Comments should explain why you did it.

Adding comments

starwars |> 
  summarise(
    mean_height = mean(height, na.rm = TRUE), # calc height mean 
    sd_height = sd(height, na.rm = TRUE) # calc height sd
  )


# calculate summary statistics and 
# remove NA values as missing `height`
# values are also missing `mass`
starwars |> 
  summarise(
    mean_height = mean(height, na.rm = TRUE),
    sd_height = sd(height, na.rm = TRUE)
  )

Sections and subsections

You can add sections and subsections to code:

# Load data ---------------------------------------

## Geospatial files -------------------------------

## Population files -------------------------------

Structuring scripts

  • All library() calls at the start - only load the packages you actually need!

  • Don’t add install.packages() to a script - run it in the console!

  • Break it down in big steps - give sections useful names!

  • Sections aren’t the only things that should be well-named - variables and functions too!

Live Demo!

  • Restructuring scripts

  • Adding sections

  • Renaming variables

  • Namespacing?

Exercise 2

  1. Reorganise the messy R script by adding sections and subsections.

  2. Edit the comments in the document to make them more useful.

  3. Rename variables and functions if you think they need to be renamed.

10:00

Styling scripts

Code style

This code runs without errors but…

starwars |> filter(height>100) |>select(eye_color, mass)|> group_by(eye_color) |>summarise(mean_mass =mean(mass, na.rm = T))


this is the same code:

starwars |> 
  filter(height > 100) |> 
  select(eye_color, mass) |> 
  group_by(eye_color) |> 
  summarise(mean_mass = mean(mass, na.rm = TRUE))

Linting

Linting - analysing source code for:

  • stylistic issues e.g. x<-3 vs x <- 3
  • common errors e.g. mean(x, na.rm = T, na.rm = F)
  • missing packages

In R, linting is performed by the {lintr} package.

{lintr}

Run lintr::lint("file.R"):

Keyboard shortcuts

Use keyboard shortcuts to lint the current file (or package).

Styling

{lintr} tells you what’s wrong, but doesn’t fix it.

The {styler} R package will style your code for you.

Keyboard shortcuts

Add a keyboard shortcut for styler::style_active_file()!

Note: {styler} doesn’t fix all issues found by {lintr}.

Live Demo!

  • Linting code

  • Styling code

  • Adding keyboard shortcuts

Exercise 3

  1. Install and load the lintr and styler packages if you don’t already use them.

  2. Run lint() on the messy R script. Do you understand all of the messages?

  3. Run style_file(). What has changed in your script?

  4. Re-run lint() on the script. Have all of the issues been fixed? If not, manually implement changes to the file.

  5. Bonus: Add an RStudio keyboard shortcut for style_active_file().

10:00

Multiple scripts and folders

Breaking up a single file

Imagine a directory structure like this:

project
│   Rscript.R

that contains all of the code for your analysis.

This is fine but:

  • it’s not great if Rscript.R is 4,000 lines long.
  • sections and subsections are great, but sometimes they aren’t enough.
  • it’s not a very descriptive name.
  • it’s a script that probably does lots of different things.

Breaking up a single file



Multiple files

Okay names

project
│   data wrangling.R
│   load data.R
│   modelling.R
│   packages.R
│   plots.R
│   plots2.R

Better names

project
│   00_packages.R
│   01_load_data.R
│   02_data_wrangling.R
│   03_exploratory_plots.R
│   04_modelling.R
│   05_final_plots_tables.R

Multiple files

Naming files

  • Prefix with numbers to give them an order (add leading zeros).
  • Give them sensible, descriptive names.
  • Avoid spaces (computers prefer - or _).

Note: similar rules apply for variable and function names.


We’ll come back to avoiding analysis_final.R and analysis_final_final.R later!

Multiple folders

Often, you don’t just have R code for a project…

project
│   00_packages.R
│   01_load_data.R
│   02_data_wrangling.R
│   03_exploratory_plots.R
│   04_modelling.R
│   05_final_plots_tables.R
│   data.csv
│   residuals.png
│   outcome_by_age.png
│   outcome_by_occupation.png

Multiple folders

… so don’t just organise your R code!

project
│   project.Rproj
│   README.md
└───data
│   │   data.csv
└───plots
│   │   residuals.png
│   │   outcome_by_age.png
│   │   outcome_by_occupation.png
└───R
│   │   00_packages.R
│   │   01_load_data.R
│   │   02_data_wrangling.R
│   │   03_exploratory_plots.R
│   │   04_modelling.R
│   │   05_final_plots_tables.R

R script dependencies

project
└───R
│   │   00_packages.R
│   │   01_load_data.R
│   │   02_data_wrangling.R
│   │   03_exploratory_plots.R
│   │   04_modelling.R
│   │   05_final_plots_tables.R
  • Script 01 depends on 00
  • Script 02 depends on 01 (and 00)
  • Script 03 depends on 02 (and 01 and 00)
  • Script 04 depends on 02 (and 01 and 00, but not 03)

Documentation

Write this stuff down (in a README.md file)!

Live Demo!

  • Creating multiple folders

  • Creating multiple files

  • Dependencies of files

Exercise 4

  1. Re-organise your R project folder with multiple sub-directories.

  2. Split your messy R script into multiple files, that are appropriately named.

  3. What is the order and dependencies of each script?

10:00

Other useful tips

Problem: your coauthor uses a different package version

renv:

  • isolate project environment

  • pin specific R package versions to each project

  • makes it easier to use the same version as collaborators

renv logo

Problem: your coauthor uses a different R version

rig:

  • manage multiple versions of R

  • switch between different R versions for different projects

r logo

Problem: your coauthor uses a different R version

rix:

  • uses Nix, a package manager focused on reproducible builds

  • create project-specific environments with a custom version of R, its packages, and all system dependencies

rix logo

Problem: copying and pasting values into a paper

Quarto (or R Markdown):

  • Combine code with narrative text.

  • Fully-reproducible documents.

  • When the document re-renders, all figures and values get updated.

Quarto logo

Problem: forgetting which scripts to re-run

targets:

  • watches the dependencies of your workflow
  • skips steps whose code, data, and upstream dependencies have not changed
  • unlike source(script.R) approach, it also manages changes to data
  • visualise the dependencies using tar_visnetwork()

targets logo

Problem: I need help with my code

reprex: reproducible example (small, rigorous, self-contained example)

  • Makes the problem more specific

  • Makes the problem reproducible

  • Makes you think more clearly about programming

Problem: I need help with my code

Help me, help you

Create a reprex of your programming problem.

The {reprex} package in R makes it easier to create a reproducible example.

You can include session information with it e.g. package versions.

reprex logo

…and I can’t share data

  • Are there built-in data sets that you can use to reproduce the problem e.g. mtcars?

  • Can you make a small, synthetic data set?

The tribble() function makes it easier to write data sets, row by row:

small_example <- tibble::tribble(
  ~A, ~B,
  1, 4.7,
  5, 2.2, 
  2, 9.8
)

Live Demo!

  • How to make a reprex

Exercise 5

10:00
  1. The following code doesn’t work. Imagine you are not allowed to share the hypoxia data with anyone else. Build a reprex that you could share with someone else.
library(ggplot2)
ggplot(hypoxia) +
  geom_col(aes(AHI, fill = Smoking))

Workshop resources

Course website: nrennie.rbind.io/training-better-r-code

Screenshot of course website

Feedback



Feedback form: forms.gle/nCutoyKy1GbnEpKq5