--- title: "Modelling lemur weights with R and Python" author: "Nicola Rennie" format: nr-revealjs: embed-resources: true --- ## Load the data Load the data from the [#TidyTuesday](https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-24/readme.md) repository: ```{r} #| label: read-data #| echo: true #| message: false #| cache: true lemurs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv') ``` ## Data wrangling Filter the data to only look at adult male Collared Brown Lemurs, and extract only the age and weight columns: ```{r} #| label: wrangling #| echo: true #| message: false #| output-location: slide library(dplyr) library(knitr) lemur_data <- lemurs %>% filter(taxon == "ECOL", sex == "M", age_category == "adult") %>% select(c(age_at_wt_mo, weight_g)) %>% rename(Age = age_at_wt_mo, Weight = weight_g) kable(head(lemur_data)) ``` ## Modelling Fit a linear model using Python: ```{python} #| label: modelling #| echo: true #| message: false lemur_data_py = r.lemur_data import statsmodels.api as sm y = lemur_data_py[["Weight"]] x = lemur_data_py[["Age"]] x = sm.add_constant(x) mod = sm.OLS(y, x).fit() lemur_data_py["Predicted"] = mod.predict(x) lemur_data_py["Residuals"] = mod.resid ``` ## Plot the residuals ```{r} #| label: plotting #| echo: true #| output-location: slide #| message: false #| fig-align: center #| fig-alt: "Scatter plot of predicted and residual values for the fitted linear model." library(reticulate) library(ggplot2) lemur_residuals <- py$lemur_data_py ggplot(data = lemur_residuals, mapping = aes(x = Predicted, y = Residuals)) + geom_point(colour = "#2F4F4F") + geom_hline(yintercept = 0, colour = "red") + theme(panel.background = element_rect(fill = "#eaf2f2", colour = "#eaf2f2"), plot.background = element_rect(fill = "#eaf2f2", colour = "#eaf2f2")) ```