Modelling lemur weights with R and Python

Nicola Rennie

Load the data

Load the data from the #TidyTuesday repository:

lemurs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv')

Data wrangling

Filter the data to only look at adult male Collared Brown Lemurs, and extract only the age and weight columns:

library(dplyr)
library(knitr)
lemur_data <- lemurs %>% 
  filter(taxon == "ECOL",
         sex == "M",
         age_category == "adult") %>% 
  select(c(age_at_wt_mo, weight_g)) %>% 
  rename(Age = age_at_wt_mo, 
         Weight = weight_g)
kable(head(lemur_data))

Data wrangling

Age Weight
129.90 2805
132.10 3001
140.32 2429
157.94 2597
164.58 2497
184.18 2225

Modelling

Fit a linear model using Python:

lemur_data_py = r.lemur_data

import statsmodels.api as sm
y = lemur_data_py[["Weight"]]
x = lemur_data_py[["Age"]]
x = sm.add_constant(x)
mod = sm.OLS(y, x).fit()
lemur_data_py["Predicted"] = mod.predict(x)
lemur_data_py["Residuals"] = mod.resid

Plot the residuals

library(reticulate)
library(ggplot2)
lemur_residuals <- py$lemur_data_py
ggplot(data = lemur_residuals,
       mapping = aes(x = Predicted,
                     y = Residuals)) +
  geom_point(colour = "#2F4F4F") +
  geom_hline(yintercept = 0,
             colour = "red") +
  theme(panel.background = element_rect(fill = "#eaf2f2",
                                        colour = "#eaf2f2"),
        plot.background = element_rect(fill = "#eaf2f2",
                                       colour = "#eaf2f2"))

Plot the residuals

Scatter plot of predicted and residual values for the fitted linear model.