Lecturer in Health Data Science within Centre for Health Informatics, Computing, and Statistics.
Academic background in statistics, with experience in data science consultancy and training.
Using R for over 10 years, and author of multiple R packages.
Combines slides, live coding examples, and exercises for you to participate in.
Ask questions throughout!
I hope you end up with more questions than answers after this workshop!
What this workshop isn’t
This isn’t prescriptive. It’s suggestions of how to make your code clearer, and more reproducible.
I don’t always follow my own advice…
Course website: nrennie.rbind.io/training-better-r-code
A talk about things that would have made my life so much easier if I’d known them five years ago.
But…
Writing code that is readable and understandable is something that future you will be grateful for.
Writing code that is readable and understandable is something that other people will be grateful for.
Let’s make our own lives easier by keeping our files organised!
Organised files, also makes it easier for R to find our (e.g. data) files.
An easy way to do this is using an R project
R Projects are a special type of file with a .Rproj extension that makes it easier for you to keep all of the data, code, and images for a project in one place.
Open up RStudio, then click File –> New Project –> New Directory – New Project.
Type in the name that you want to call your new folder e.g. R Workshop
. Then use Browse to select where on your computer you want to make the folder. IMPORTANT: remember where this is!
Finally, click Create Project. Your new folder will be created and opened in RStudio - sometimes it can take a couple of minutes.
Use R Projects
Keep everything related to your analysis together, and easy for R to find by using projects.
Where do you write your code?
Console?
.R Script?
R Markdown file
Save your code
You should store the code that creates any object that exists in your Global Environment - otherwise your analysis isn’t reproducible!
Create a script that creates the object.
Save the object as an RDS file.
In further analysis, load the RDS object.
Creating a project folder
Downloading data
Project settings
Create an R project for today’s workshop - name it something sensible!
Download the messy_exercises_script.R
script from nrennie.rbind.io/training-better-r-code/exercises.html and the hypoxia.csv
data.
Add those files to your R project in a sensible way.
Edit the Global / Project options appropriately.
Edit the script to use relative file paths.
10:00
Add comments using a #
in R (in a separate line)
Comments don’t need to explain what your code does.
Comments should explain why you did it.
All library()
calls at the start - only load the packages you actually need!
Don’t add install.packages()
to a script - run it in the console!
Break it down in big steps - give sections useful names!
Sections aren’t the only things that should be well-named - variables and functions too!
Restructuring scripts
Adding sections
Renaming variables
Namespacing?
Reorganise the messy R script by adding sections and subsections.
Edit the comments in the document to make them more useful.
Rename variables and functions if you think they need to be renamed.
10:00
This code runs without errors but…
Linting - analysing source code for:
x<-3
vs x <- 3
mean(x, na.rm = T, na.rm = F
)In R, linting is performed by the {lintr} package.
Run lintr::lint("file.R")
:
Keyboard shortcuts
Use keyboard shortcuts to lint the current file (or package).
{lintr} tells you what’s wrong, but doesn’t fix it.
The {styler} R package will style your code for you.
Keyboard shortcuts
Add a keyboard shortcut for styler::style_active_file()
!
Note: {styler} doesn’t fix all issues found by {lintr}.
Linting code
Styling code
Adding keyboard shortcuts
Install and load the lintr
and styler
packages if you don’t already use them.
Run lint()
on the messy R script. Do you understand all of the messages?
Run style_file()
. What has changed in your script?
Re-run lint()
on the script. Have all of the issues been fixed? If not, manually implement changes to the file.
Bonus: Add an RStudio keyboard shortcut for style_active_file()
.
10:00
Imagine a directory structure like this:
that contains all of the code for your analysis.
This is fine but:
Rscript.R
is 4,000 lines long.Naming files
-
or _
).Note: similar rules apply for variable and function names.
We’ll come back to avoiding analysis_final.R
and analysis_final_final.R
later!
Often, you don’t just have R code for a project…
… so don’t just organise your R code!
01
depends on 00
02
depends on 01
(and 00
)03
depends on 02
(and 01
and 00
)04
depends on 02
(and 01
and 00
, but not 03
)Documentation
Write this stuff down (in a README.md file)!
Creating multiple folders
Creating multiple files
Dependencies of files
Re-organise your R project folder with multiple sub-directories.
Split your messy R script into multiple files, that are appropriately named.
What is the order and dependencies of each script?
10:00
renv:
isolate project environment
pin specific R package versions to each project
makes it easier to use the same version as collaborators
rig:
manage multiple versions of R
switch between different R versions for different projects
Quarto (or R Markdown):
Combine code with narrative text.
Fully-reproducible documents.
When the document re-renders, all figures and values get updated.
source(script.R)
approach, it also manages changes to datatar_visnetwork()
reprex: reproducible example (small, rigorous, self-contained example)
Makes the problem more specific
Makes the problem reproducible
Makes you think more clearly about programming
Help me, help you
Create a reprex of your programming problem.
The {reprex}
package in R makes it easier to create a reproducible example.
You can include session information with it e.g. package versions.
Are there built-in data sets that you can use to reproduce the problem e.g. mtcars
?
Can you make a small, synthetic data set?
10:00
hypoxia
data with anyone else. Build a reprex that you could share with someone else.Course website: nrennie.rbind.io/training-better-r-code