Rolling 5-year Sum Label Using `slider`

October 22, 2024

Problem

First blog post off the back of an issue at work which required desktop research to figure out. This content aims to better cement my learning and to share to anyone who has been afflicted with a similar issue.

For context, I was unable to download new packages and only had the slider package available. Notable packages which came up in research are zoo and TTR which could provide similar results I’m sure, but the focus here is on slider.

Whilst calculating a rolling sum took a little while to figure out, the greater block was generating a label for plotting.

Let’s begin with setting global options and getting data.

# Load packages
library(tidyverse)
library(slider)

theme_set(theme_minimal(
 base_family = 'serif', base_size = 12))

set.seed(4321)

# Get data
example_df <- bind_rows(
 tibble(group = "group_1",
  year = c(1991:1993, 1995, 2000:2005),
  to_add = sample(1:40, 10)),
 
 tibble(group = "group_2",
  year = c(2000:2005),
  to_add = sample(1:40, 6)))

Calculate Rolling Sum

Calculate the rolling sum using an reference column or index, in this instance year. Set before argument to 4 which will then include the value itself to provide a rolling 5 year window. This code is using the _int variant of the slide_index_* functions for integer but there are equivalents for double, logical, character, and data frame.

In the function(s) slide_index_* you can’t reference the column that’s being iterated on. Instead I used the slide function to generate labels for plotting.

example_df %>%
 group_by(group) %>% 
 arrange(year) %>%
 mutate(
  # Calculate rolling sum
  rolling_5_years = slide_index_int(to_add, year, sum, .before = 4),
  # Generate rolling sum labels
  rolling_5_years_label = slide(year, ~.x, .before = 4) %>%
   as.character() %>%
   unlist()
 ) %>% 
 head(5)
## # A tibble: 5 × 5
## # Groups:   group [1]
##   group    year to_add rolling_5_years rolling_5_years_label          
##   <chr>   <dbl>  <int>           <int> <chr>                          
## 1 group_1  1991     27              27 1991                           
## 2 group_1  1992     29              56 c(1991, 1992)                  
## 3 group_1  1993     10              66 c(1991, 1992, 1993)            
## 4 group_1  1995     15              81 c(1991, 1992, 1993, 1995)      
## 5 group_1  2000     24              24 c(1991, 1992, 1993, 1995, 2000)

Note that this function helpfully counts missing values within the sequence. See above example, 1994 is missing but for 1995 the rolling years sum of 65 includes only 1991, 1992, 1993, and 1995 to_add values.

Now to visualise!

Visualise

Important to note when using slide to make a label, when the groups starting index value mismatch you will get the below plot which does look messy.

example_df %>%
 group_by(group) %>% 
 arrange(year) %>%
 mutate(
  # Calculate rolling sum
  rolling_5_years = slide_index_int(to_add, year, sum, .before = 4),
  # Generate rolling sum labels
  rolling_5_years_label = slide(year, ~.x, .before = 4) %>%
   as.character() %>%
   unlist()
 ) %>% 
 ggplot(aes(x = rolling_5_years, 
            y = rolling_5_years_label, 
            fill = group)) +
 geom_col(position = "dodge", colour = "white") +
 labs(title = "Rolling 5 year period example",
      subtitle = "A little messy", 
      x = "", y = "") +
 scale_fill_viridis_d(alpha = 0.75, name = "") +
 theme(
  legend.position = "top",
  plot.title.position = "plot"
 ) 

You may be required to include mismatched starting index values for your use case. But if that’s not the case and you want to provide a bit more clarity to the plot - I’ve filtered to exclude any year before 2000, now both groups start at the same year / the same starting index value.

Not all rolling years have the same number of years unless you add the .complete = TRUE argument. Be sure to filter out NA’s afterwards.

I’ve coded the labels to collapse years into colon format to maximise plot space.

# Set base rolling value the same
example_rolling_sum <- example_df %>%
 filter(!year < 2000) %>% 
 group_by(group) %>% 
 arrange(year) %>%
 mutate(
  # Calculate rolling sum
  rolling_5_years = slide_index_int(to_add, year, sum, .before = 4, .complete = TRUE),
  # Generate rolling sum labels
  rolling_5_years_label = slide(as.double(year), ~.x, .before = 4, .complete = TRUE) %>%
   map(~paste(min(.), ":", max(.))) %>% 
   unlist()
 )
## Warning: There were 16 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `rolling_5_years_label = `%>%`(...)`.
## ℹ In group 1: `group = "group_1"`.
## Caused by warning in `min()`:
## ! no non-missing arguments to min; returning Inf
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 15 remaining warnings.
 # Plot
 example_rolling_sum %>% 
 filter(!is.na(rolling_5_years)) %>% 
 ggplot(aes(x = rolling_5_years, 
            y = rolling_5_years_label, 
            fill = group)) +
 geom_col(position = "dodge", colour = "white") +
  labs(title = "Rolling 5 year period example", 
       subtitle = "Pretty",
       x = "", y = "") +
 scale_fill_viridis_d(alpha = 0.75, name = "") +
 theme(
  legend.position = "top",
  plot.title.position = "plot"
 ) 

Conclusion

I think slider is a great package, was I forced to use it because I couldn’t download any other package - sure. But would I use it again given the keys to CRAN, yep. Hope this tidbit about labels helped in some small way. That brings me to the end of the first self-tech-support blog.

Acknowledgements

Packages and package maintainer(s):

  • slider | Davis Vaughan
  • tidyverse | Hadley Wickham
Posted on:
October 22, 2024
Length:
4 minute read, 825 words
Categories:
self-tech-support
Tags:
slider rstats
See Also:
NBA Final 2024 Process Mining