Learning Julia with #TidyTuesday and Tidier.jl
Tidier.jl is a Julia implementation of the {tidyverse}, and after 10 weeks of data wrangling and plotting #TidyTuesday data in Julia, I wanted to share what I've learnt about Julia as an R user.
June 1, 2023
Learning a new programming language can be tricky, and it can sometimes lead to more ways of getting confused - even in languages you know well! I always mix up len()
and length()
… When returning to Python after primarily working in R for a while, I found
Shiny for Python a really good resource for learning. The concepts were similar enough that writing Shiny code in R and Python didn’t feel too different.
I dabbled in Julia during Advent of Code last year, and wanted to learn a little but more. So when I noticed that Karandeep Singh had started developing Tidier.jl - a Julia implementation of the {tidyverse} - I was very keen to give it a go! So I started #JuliaTuesday - where I used the data sets from #TidyTuesday, but performed the data wrangling and visualisation in Julia instead of R. After 10 weeks, I wanted to share what I’d learned.
What is Julia?
Let’s back up a second. What is Julia? Julia is an high-level, general-purpose, open source programming language. One of the main differences between Julia and R is that Julia is a compiled language, whereas R is an interpreted language (as is Python). This means it’s likely to be faster than R, and so it piqued my interest. I’m not going to go into the depths of R versus Julia here - plenty of those blog posts already exist! What I want to talk about is Tidier.jl!
Tidier.jl
Tidier.jl is a Julia implementation of the {tidyverse}. One of its aims is to stick as closely to {tidyverse} syntax as possible - which makes it reasonably straightforward to get started as an R user. In Julia, we can install Tidier.jl using:
|
|
This is essentially equivalent to using install.packages()
in R. Before we dive into a full side-by-side comparison of {tidyverse} and Tidier.jl, let’s load some packages and download the data from the
#TidyTuesday GitHub repo. Here, I’m using the US Egg Production data, looking at the number of different types of eggs produced each month in the USA.
|
|
|
|
Though the syntax and function names may be quite different between R and Julia here, you can see that the idea is exactly the same: (i) load packages using Using Pkgname
or library(Pkgname)
, and then (ii) read in the data directly from a URL and save it as a dataframe.
Now let’s see how the two compare on some simple data wrangling tasks. Tidier.jl supports a lot of {tidyverse} functions - primarily those originating in {dplyr} and {tidyr}. See tidierorg.github.io/Tidier.jl/stable/ for full documentation on how these functions work in Tidier.jl.
Let’s filter our data to consider only cage-free organic eggs, and then convert the units to millions of eggs:
|
|
|
|
The key difference here is chaining in Julia. In practice, add an @
before each function, rather than a |>
. The other difference here is that we specify when to begin
and end
the chain, but otherwise - the similarities are striking.
Data visualisation in Julia
There are a lot of different plotting packages in Julia, but the one I found myself using most often was
AlgebraOfGraphics.jl. AlgebraOfGraphics.jl a a data visualisation language for Julia, that’s built on the idea of combining different building blocks (using +
and *
) to make plots. The principles are similar to {ggplot2} - where plots are made of layers. There’s even a theme_ggplot2()
function included with AlgebraOfGraphics.jl, if you want to stick with the classic theme.
|
|
|
|
You can see how similar the two plots appear below (the Julia version is on the left!). The only real difference I could spot was that {ggplot2} automatically labels the x-axis with only the year, whereas AlgebraOfGraphics.jl uses the full date.
Another new package has also recently entered the Julia scene - TidierPlots.jl. TidierPlots.jl brings a reimplementation of {ggplot2} to Julia, which is built on top of AlgebraOfGraphics.jl. Although, I didn’t experiment with TidierPlots.jl for my #JuliaTuesday challenge, I couldn’t resist trying it out for this blog post.
At the time of writing, there isn’t yet an implementation of geom_line()
in TidierPlots.jl, so we’ll go with points instead:
|
|
It’s so similar to {ggplot2} that you can almost copy and paste your code - and just add an @
at the start of each line! TidierPlots.jl is one of the most exciting developments in Julia, and if you’re familiar with {ggplot2} (or indeed AlgebraOfGraphics.jl), the learning curve is very, very gentle.
Final thoughts
I really enjoyed getting to grips with Tidier.jl and found it an easier way to start learning Julia, through using concepts and functions that were already familiar to me as an R user. Of course, being proficient in Tidier.jl doesn’t make me proficient in Julia as a whole, but I did get introduced to some of the differences and quirks of Julia along the way. I’m definitely keen to use a little bit more Julia in my work, including with Quarto and in combination with R, through the {JuliaCall} package.
You can view the visualisations I created (with code) in the Quarto document published on QuartoPub. You can also view the source code on GitHub.
Thanks to Karandeep Singh (and other contributors) for developing Tidier.jl - it’s definitely made my Julia journey easier!
Image: giphy.com
For attribution, please cite this work as:
Learning Julia with #TidyTuesday and Tidier.jl.
Nicola Rennie. June 1, 2023.
nrennie.rbind.io/blog/learning-julia-with-tidytuesday-tidier
BibLaTeX Citation
@online{rennie2023, author = {Nicola Rennie}, title = {Learning Julia with #TidyTuesday and Tidier.jl}, date = {2023-06-01}, url = {https://nrennie.rbind.io/blog/learning-julia-with-tidytuesday-tidier} }
Licence: creativecommons.org/licenses/by/4.0