Exploring Life Expectancy by Continent Through gganimate

R
ggplot2
Data Visualization
Author

Ken Vu

Published

November 17, 2023

Introduction

Through my Master’s Degree in Statistics at California State University - East Bay, I got a brief introduction to the R package gganimate, which allows you to animate your ggplot2 graphs to change over time (check out the package website here).

Given some of the interesting packages and visualizations I learned about so far through my Master’s education and the aforementioned R book above, I wanted to try to see if I could incorporate what I’ve learned so far into some exploratory data analysis. As an example, we’ll look at the gapminder dataset from the gapminder package, which contains data on data on life expectancy, GDP per capita, and population by country from 1952 to 2007.

I. Getting started

For the remainder of this blog, you’ll need to have the following packages - dplyr, gapminder, gganimate, and ggplot2.

Here, I’m using the pacman package to load in the packages and automatically update/install them if they don’t exist yet on my computer or they are out of date.

library(pacman)
p_load(dplyr, gapminder, gganimate, ggplot2)

Now, let’s take a brief look at the gapminder data set from the gapminder package.

gapminder
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

Here, in the gapminder data set, we can see that each row in the data set contains records for a country’s life expectancy, population, and gross domestic product (GPD) per capita in US dollars adjusted for inflation for each five year increment from 1952 to 2007.

II. Creating initial graph

Before we can start making animations, we should have an idea of what we want our graphs to look like before we animate them.

Let’s start with exploring the range of values for life expectancy by continent in the year 1952 through boxplots.

A. Setting up the boxplot

We can use the geom_boxplot() function to do so and save the result to the object gp_boxplot.

Additionally, we’ll add a vertical line using geom_vline() to display the median life expectancy for that year.

Here, we plot life expectancy values on the x-axis and the continents on the y-axis using data for the year 1952.

gp_boxplot <- gapminder |> 
  filter(year == 1952) |> 
  ggplot(aes(x = lifeExp, y = continent)) + 
  geom_boxplot() + 
  geom_vline(aes(xintercept = median(lifeExp)),
             color = "red",
             linewidth = 1) + 
  labs(alt = "A boxplot of life expectancy values by continent for 1952.  We have a red line marking where the median life expectancy is for that year.")
gp_boxplot

A boxplot of life expectancy values by continent for 1952.  We have a red line marking where the median life expectancy is for that year.

We can see in the plot earlier that in 1952, Oceania generally has higher life expectancy than other continents as well as a narrower range of values than other continents; that continent has less countries that fall within Oceania so it translates to less data for Oceania.

We can also see that in the year 1952, Europe overall tends to have higher life expectancy than America, Asia, and Africa with Africa generally having the lowest life expectancy on average.

B. Making the boxplot neater

For aesthetic purposes, let’s think of ways to beautify the graph. We can make the panel’s background white, add a thin border around the panel, and add/modify plot labels through the labs() ggplot2 layer as well as the theme() layer to adjust how the visual elements (i.e., title, color of panel borders, etc) are displayed.

gp_boxplot <- gp_boxplot +
  
  # adjusting plotting labels
  labs(x = "Life Expectancy (years)", y = "Continent",
       title = "Life Expectancy by Continent (1952)",
       subtitle = "",
       caption = "Source: gapminder",
       alt = "A boxplot of life expectancy values by continent for 1952.  We have a red line marking where the median life expectancy is for that year.") + 
  
    # modifying appearance of plot
  theme(
        # make plotting area (aka "panel") white
        panel.background = element_rect(fill = "white"),
        
        # make the title boldfaced, slightly larger, and centered
        plot.title = element_text(hjust=0.4, face = "bold",
                                  size = 14),
        # add subtle borders around panel
        panel.border = element_rect(fill = NA,
                                    color = "darkgrey")
        )
gp_boxplot

A boxplot of life expectancy values by continent for 1952.  We have a red line marking where the median life expectancy is for that year.

We can also add a label to indicate what the red line represents. First, we create the text for the label and format it to be compact using the str_wrap() function from the package stringr.

medLifeExp_label <- "Median life expectancy" |> 
  stringr::str_wrap(width = 10)
medLifeExp_label
[1] "Median\nlife\nexpectancy"

We can now use geom_label() to place a label marking what the red line represents on the boxplot.

gp_boxplot +
  geom_label(
    aes(x = median(lifeExp) - 0.5, y = "Oceania"),
        label = medLifeExp_label,
    color = "red", 
    hjust = "right",
    alpha = 1,
    size = 3
  ) 

A boxplot of life expectancy values by continent for 1952.  We have a red line marking where the median life expectancy is for that year.  Next to the red line is a red box with the label 'Median life expectancy' inside.

Now that we have our graph set up, let’s use gganimate to animate boxplots of life expectancy by continent throughout the years (ranging from 1952 to 2007).

III. Create moving graphs with gganimate

Now, that we have an idea of what we want our graph to look like, we can start animating the boxplot earlier using the gganimate package.

A. Setting up a general animated boxplot

We begin by first making out boxplot and then adding the gganimate layer transition_time() to set the variable we’ll flip through for the animation (in this case, the column year). Then, we use the ease_aes() layer to set the speed at which it flips through each frame in the animation; we set it to linear so it flips through each frame at a constant rate.

Going back to our boxplot, we’ll have different boxplots for each year from 1952 to 2007 and flip through them in order in one seamless, animated GIF file (the default output of the animation); each frame in the animation represents a boxplot for each year in sequential order from 1952 to 2007.

Optionally, you can display the current instance of the column being flipped through by printing the placeholder {frame_time} in whichever function you use to display text output. Here, in the labs() function in the code segment below, we’ll use the plot’s subtitle to label the year of each boxplot shown for each frame; that way, we know which year we’re flipping through during the animation.

gapminder |> 
  ggplot(aes(x = lifeExp, y = continent)) + 
  geom_boxplot() +
  
  # modifying the appearance of the plot
  theme(
        panel.background = element_rect(fill = "white"),
        plot.title = element_text(hjust=0.4, face = "bold",
                                  size = 14),
        plot.subtitle = element_text(hjust=0.4), 
        panel.border = element_rect(fill = NA,
                                    color = "darkgrey")
        ) + 

  
  # setting column to transition through
  transition_time(year) + 
  
  # speed at which animation flips through plots
  ease_aes('linear') +
  
  # adding labels
  labs(x = "Life Expectancy (years)", y = "Continent",
       title = "Life Expectancy by Continent",
       
       # {frame_time} is placeholder name of current value of
       # of column we transition through
       subtitle = "Year: {frame_time}",
       caption = "Source: gapminder",
       alt = "A moving boxplot of life expectancy values by continent that changes over time from 1952 to 2007 year by year.  We can see here that the life expectancy values for all the continents for each year go up consistently.")

A moving boxplot of life expectancy values by continent that changes over time from 1952 to 2007 year by year.  We can see here that the life expectancy values for all the continents for each year go up consistently.

Above, we have a seamless animated boxplot that showcases the distribution of life expectancy values per continent over the years (i.e., 1952 to 2007). As you may notice, the subtitle showcasing the year of the boxplot in frame changes over time, thanks to the labs() function in which we use {frame_time} to have the displayed year to change with each frame.

B. Adding a moving line for median life expectancy

Let’s also plot the median life expectancy for each year so we can see how it changes over time.

First, we create a tibble called median_lifeExp to calculate the median life expectancy across the whole globe for each year from 1952 to 2007.

median_lifeExp <- gapminder |> 
  group_by(year) |> 
  summarise(medLifeExp = median(lifeExp))
median_lifeExp
# A tibble: 12 × 2
    year medLifeExp
   <int>      <dbl>
 1  1952       45.1
 2  1957       48.4
 3  1962       50.9
 4  1967       53.8
 5  1972       56.5
 6  1977       59.7
 7  1982       62.4
 8  1987       65.8
 9  1992       67.7
10  1997       69.4
11  2002       70.8
12  2007       71.9

Next, we perform a left join between gapminder and median_LifeExp on the column year; this merge operation adds an additional column for the global median life expectancy that we can compare with a country’s life expectancy for a given year.

With the global median life expectancy values added in, we can use geom_vline() to add a solid red line to represent the global median life expectancy for the year in frame. We can also add a label to mark the red line using geom_label().

gapminder |> 
  left_join(median_lifeExp) |> 
  
  ggplot(aes(x = lifeExp, y = continent)) + 
  geom_boxplot() +
  
  # adding a vertical line for median life expectancy 
  geom_vline(aes(xintercept = medLifeExp), color = "red", linewidth = 1) +
  
  geom_label(
    aes(x = medLifeExp - 0.5, y = "Oceania"),
        label = medLifeExp_label,
    color = "red", 
    hjust = "right",
    alpha = 1,
    size = 3
  ) +
  
  # modifying the appearance of the plot
  theme(
        panel.background = element_rect(fill = "white"),
        plot.title = element_text(hjust=0.4, face = "bold",
                                  size = 14),
        plot.subtitle = element_text(hjust=0.4), 
        panel.border = element_rect(fill = NA,
                                    color = "darkgrey")
        ) + 

  
  # setting column to transition through
  transition_time(year) + 
  
  # speed at which animation flips through plots
  ease_aes('linear') +
  
  # adding labels
  labs(x = "Life Expectancy (years)", y = "Continent",
       title = "Life Expectancy by Continent",
       
       # {frame_time} is placeholder name of current value of
       # of column we transition through
       subtitle = "Year: {frame_time}",
       caption = "Source: gapminder",
       alt = "A moving boxplot of life expectancy values by continent that changes over time from 1952 to 2007 year by year.  We can see here that the life expectancy values for all the continents as well as the median life expectancy across the globe for each year go up consistently.")
Joining with `by = join_by(year)`

A moving boxplot of life expectancy values by continent that changes over time from 1952 to 2007 year by year.  We can see here that the life expectancy values for all the continents as well as the median life expectancy across the globe for each year go up consistently.

So far, we can see that the animation is working as expected, showcasing the distribution of life expectancy values for each continent over time from 1952 to 2007. The red line marking the median life expectancy each year moves with the boxplot.

Based on the animation above, we can see that the median life expectancy across the globe generally increases over time with many continents (except Africa) consistently increasing in terms of life expectancy overall.

Africa’s range of life expectancy values initially increases with other continents before starting to drop slightly around 1998 and then climbing back up around 2004.

Keep in mind that since the data’s recorded in five-year increments, gganimate fills in gaps in the data in order to create a seamless transition between different increments (given it increments year by year in the animation). Thus, while the animation is smooth, it’s not exactly an accurate representation of the data.

IV. Conclusion

Nevertheless, gganimate is a unique package that allows R users to create animated ggplot2 graphics to better enhance their data visualizations. It’s useful for visual learners who like to utilize motion as an extra way of communicating change over time in a striking manner.

While this article made a couple of assumptions about the data (such as assuming that there aren’t any major fluctuations in the data between the five year increments), it’s been an engaging experience to explore some of what gganimate can do with ggplot2.

I’m hoping to further explore and deepen my understanding of ggplot2 moving forward as well as explore other ways to visualize data - one of which involves creating dashboards with RShiny!

Thank you for reading and be sure to stay tuned for more from The R Files!

Further Resources

A. Books

I recommend the following resources for enhancing your exploration and experimentation with the ggplot2 library.

  • R for Data Science (2nd Ed): This book (now in it’s second edition) is a classic and covers some of the bare essentials needed to work with and display all kinds of data as well as strategies for writing clean code. Chapters 10-12 are relevant for those focused on data visualization in general as well as Chapters 2-9 for generally good practices for writing and maintaining clean code. You can read it online here: https://r4ds.hadley.nz/

  • ggplot2: Elegant Graphics for Data Analysis (3rd Ed): This is a good book that explains the grammar of graphics of ggplot2 and how it works under the surface. Chapters 3-5, 8, 9, 10-14, and 17 are some chapters I recommend for understanding more of the basics of ggplot2’s plotting functions, the steps for making ggplot2 graphs in general, and some of the ways in which the plot aesthetics are made and how they can be modified. You can read the work-in-progress version online here: https://ggplot2-book.org/

  • Data Visualization with R by : It’s a direct guide on building plots with ggplot2 along with best practices for data visualizations in general. Chapters 3-6, 11, and 14 directly focus on ggplot2, its wide array of customization for the aesthetics of its plots, and best practices for creating effective and visually sound graphs. Chapter 10 is also interesting if you’d like to explore other graphs besides the conventional 2D bar plots and scatterplots you see often in ggplot2, such as dumbbell plots and heat maps. Currently, it’s only available as an online bookdown, but a book version is reportedly in the works of being available on Amazon soon.