library(pacman)
p_load(dplyr, gapminder, gganimate, ggplot2)
Introduction
Through my Master’s Degree in Statistics at California State University - East Bay, I got a brief introduction to the R package gganimate
, which allows you to animate your ggplot2
graphs to change over time (check out the package website here).
Given some of the interesting packages and visualizations I learned about so far through my Master’s education and the aforementioned R book above, I wanted to try to see if I could incorporate what I’ve learned so far into some exploratory data analysis. As an example, we’ll look at the gapminder
dataset from the gapminder
package, which contains data on data on life expectancy, GDP per capita, and population by country from 1952 to 2007.
I. Getting started
For the remainder of this blog, you’ll need to have the following packages - dplyr
, gapminder
, gganimate
, and ggplot2
.
Here, I’m using the pacman
package to load in the packages and automatically update/install them if they don’t exist yet on my computer or they are out of date.
Now, let’s take a brief look at the gapminder
data set from the gapminder
package.
gapminder
# A tibble: 1,704 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
7 Afghanistan Asia 1982 39.9 12881816 978.
8 Afghanistan Asia 1987 40.8 13867957 852.
9 Afghanistan Asia 1992 41.7 16317921 649.
10 Afghanistan Asia 1997 41.8 22227415 635.
# ℹ 1,694 more rows
Here, in the gapminder
data set, we can see that each row in the data set contains records for a country’s life expectancy, population, and gross domestic product (GPD) per capita in US dollars adjusted for inflation for each five year increment from 1952 to 2007.
II. Creating initial graph
Before we can start making animations, we should have an idea of what we want our graphs to look like before we animate them.
Let’s start with exploring the range of values for life expectancy by continent in the year 1952 through boxplots.
A. Setting up the boxplot
We can use the geom_boxplot()
function to do so and save the result to the object gp_boxplot
.
Additionally, we’ll add a vertical line using geom_vline()
to display the median life expectancy for that year.
Here, we plot life expectancy values on the x-axis and the continents on the y-axis using data for the year 1952.
<- gapminder |>
gp_boxplot filter(year == 1952) |>
ggplot(aes(x = lifeExp, y = continent)) +
geom_boxplot() +
geom_vline(aes(xintercept = median(lifeExp)),
color = "red",
linewidth = 1) +
labs(alt = "A boxplot of life expectancy values by continent for 1952. We have a red line marking where the median life expectancy is for that year.")
gp_boxplot
We can see in the plot earlier that in 1952, Oceania generally has higher life expectancy than other continents as well as a narrower range of values than other continents; that continent has less countries that fall within Oceania so it translates to less data for Oceania.
We can also see that in the year 1952, Europe overall tends to have higher life expectancy than America, Asia, and Africa with Africa generally having the lowest life expectancy on average.
B. Making the boxplot neater
For aesthetic purposes, let’s think of ways to beautify the graph. We can make the panel’s background white, add a thin border around the panel, and add/modify plot labels through the labs()
ggplot2 layer as well as the theme()
layer to adjust how the visual elements (i.e., title, color of panel borders, etc) are displayed.
<- gp_boxplot +
gp_boxplot
# adjusting plotting labels
labs(x = "Life Expectancy (years)", y = "Continent",
title = "Life Expectancy by Continent (1952)",
subtitle = "",
caption = "Source: gapminder",
alt = "A boxplot of life expectancy values by continent for 1952. We have a red line marking where the median life expectancy is for that year.") +
# modifying appearance of plot
theme(
# make plotting area (aka "panel") white
panel.background = element_rect(fill = "white"),
# make the title boldfaced, slightly larger, and centered
plot.title = element_text(hjust=0.4, face = "bold",
size = 14),
# add subtle borders around panel
panel.border = element_rect(fill = NA,
color = "darkgrey")
) gp_boxplot
We can also add a label to indicate what the red line represents. First, we create the text for the label and format it to be compact using the str_wrap()
function from the package stringr
.
<- "Median life expectancy" |>
medLifeExp_label ::str_wrap(width = 10)
stringr medLifeExp_label
[1] "Median\nlife\nexpectancy"
We can now use geom_label()
to place a label marking what the red line represents on the boxplot.
+
gp_boxplot geom_label(
aes(x = median(lifeExp) - 0.5, y = "Oceania"),
label = medLifeExp_label,
color = "red",
hjust = "right",
alpha = 1,
size = 3
)
Now that we have our graph set up, let’s use gganimate
to animate boxplots of life expectancy by continent throughout the years (ranging from 1952 to 2007).
III. Create moving graphs with gganimate
Now, that we have an idea of what we want our graph to look like, we can start animating the boxplot earlier using the gganimate
package.
A. Setting up a general animated boxplot
We begin by first making out boxplot and then adding the gganimate
layer transition_time()
to set the variable we’ll flip through for the animation (in this case, the column year
). Then, we use the ease_aes()
layer to set the speed at which it flips through each frame in the animation; we set it to linear
so it flips through each frame at a constant rate.
Going back to our boxplot, we’ll have different boxplots for each year from 1952 to 2007 and flip through them in order in one seamless, animated GIF file (the default output of the animation); each frame in the animation represents a boxplot for each year in sequential order from 1952 to 2007.
Optionally, you can display the current instance of the column being flipped through by printing the placeholder {frame_time}
in whichever function you use to display text output. Here, in the labs()
function in the code segment below, we’ll use the plot’s subtitle to label the year of each boxplot shown for each frame; that way, we know which year we’re flipping through during the animation.
|>
gapminder ggplot(aes(x = lifeExp, y = continent)) +
geom_boxplot() +
# modifying the appearance of the plot
theme(
panel.background = element_rect(fill = "white"),
plot.title = element_text(hjust=0.4, face = "bold",
size = 14),
plot.subtitle = element_text(hjust=0.4),
panel.border = element_rect(fill = NA,
color = "darkgrey")
+
)
# setting column to transition through
transition_time(year) +
# speed at which animation flips through plots
ease_aes('linear') +
# adding labels
labs(x = "Life Expectancy (years)", y = "Continent",
title = "Life Expectancy by Continent",
# {frame_time} is placeholder name of current value of
# of column we transition through
subtitle = "Year: {frame_time}",
caption = "Source: gapminder",
alt = "A moving boxplot of life expectancy values by continent that changes over time from 1952 to 2007 year by year. We can see here that the life expectancy values for all the continents for each year go up consistently.")
Above, we have a seamless animated boxplot that showcases the distribution of life expectancy values per continent over the years (i.e., 1952 to 2007). As you may notice, the subtitle showcasing the year of the boxplot in frame changes over time, thanks to the labs()
function in which we use {frame_time}
to have the displayed year to change with each frame.
B. Adding a moving line for median life expectancy
Let’s also plot the median life expectancy for each year so we can see how it changes over time.
First, we create a tibble called median_lifeExp
to calculate the median life expectancy across the whole globe for each year from 1952 to 2007.
<- gapminder |>
median_lifeExp group_by(year) |>
summarise(medLifeExp = median(lifeExp))
median_lifeExp
# A tibble: 12 × 2
year medLifeExp
<int> <dbl>
1 1952 45.1
2 1957 48.4
3 1962 50.9
4 1967 53.8
5 1972 56.5
6 1977 59.7
7 1982 62.4
8 1987 65.8
9 1992 67.7
10 1997 69.4
11 2002 70.8
12 2007 71.9
Next, we perform a left join between gapminder
and median_LifeExp
on the column year; this merge operation adds an additional column for the global median life expectancy that we can compare with a country’s life expectancy for a given year.
With the global median life expectancy values added in, we can use geom_vline()
to add a solid red line to represent the global median life expectancy for the year in frame. We can also add a label to mark the red line using geom_label()
.
|>
gapminder left_join(median_lifeExp) |>
ggplot(aes(x = lifeExp, y = continent)) +
geom_boxplot() +
# adding a vertical line for median life expectancy
geom_vline(aes(xintercept = medLifeExp), color = "red", linewidth = 1) +
geom_label(
aes(x = medLifeExp - 0.5, y = "Oceania"),
label = medLifeExp_label,
color = "red",
hjust = "right",
alpha = 1,
size = 3
+
)
# modifying the appearance of the plot
theme(
panel.background = element_rect(fill = "white"),
plot.title = element_text(hjust=0.4, face = "bold",
size = 14),
plot.subtitle = element_text(hjust=0.4),
panel.border = element_rect(fill = NA,
color = "darkgrey")
+
)
# setting column to transition through
transition_time(year) +
# speed at which animation flips through plots
ease_aes('linear') +
# adding labels
labs(x = "Life Expectancy (years)", y = "Continent",
title = "Life Expectancy by Continent",
# {frame_time} is placeholder name of current value of
# of column we transition through
subtitle = "Year: {frame_time}",
caption = "Source: gapminder",
alt = "A moving boxplot of life expectancy values by continent that changes over time from 1952 to 2007 year by year. We can see here that the life expectancy values for all the continents as well as the median life expectancy across the globe for each year go up consistently.")
Joining with `by = join_by(year)`
So far, we can see that the animation is working as expected, showcasing the distribution of life expectancy values for each continent over time from 1952 to 2007. The red line marking the median life expectancy each year moves with the boxplot.
Based on the animation above, we can see that the median life expectancy across the globe generally increases over time with many continents (except Africa) consistently increasing in terms of life expectancy overall.
Africa’s range of life expectancy values initially increases with other continents before starting to drop slightly around 1998 and then climbing back up around 2004.
Keep in mind that since the data’s recorded in five-year increments, gganimate
fills in gaps in the data in order to create a seamless transition between different increments (given it increments year by year in the animation). Thus, while the animation is smooth, it’s not exactly an accurate representation of the data.
IV. Conclusion
Nevertheless, gganimate
is a unique package that allows R users to create animated ggplot2
graphics to better enhance their data visualizations. It’s useful for visual learners who like to utilize motion as an extra way of communicating change over time in a striking manner.
While this article made a couple of assumptions about the data (such as assuming that there aren’t any major fluctuations in the data between the five year increments), it’s been an engaging experience to explore some of what gganimate
can do with ggplot2
.
I’m hoping to further explore and deepen my understanding of ggplot2
moving forward as well as explore other ways to visualize data - one of which involves creating dashboards with RShiny!
Thank you for reading and be sure to stay tuned for more from The R Files!
Further Resources
A. Books
I recommend the following resources for enhancing your exploration and experimentation with the ggplot2
library.
R for Data Science (2nd Ed): This book (now in it’s second edition) is a classic and covers some of the bare essentials needed to work with and display all kinds of data as well as strategies for writing clean code. Chapters 10-12 are relevant for those focused on data visualization in general as well as Chapters 2-9 for generally good practices for writing and maintaining clean code. You can read it online here: https://r4ds.hadley.nz/
ggplot2: Elegant Graphics for Data Analysis (3rd Ed): This is a good book that explains the grammar of graphics of
ggplot2
and how it works under the surface. Chapters 3-5, 8, 9, 10-14, and 17 are some chapters I recommend for understanding more of the basics ofggplot2
’s plotting functions, the steps for makingggplot2
graphs in general, and some of the ways in which the plot aesthetics are made and how they can be modified. You can read the work-in-progress version online here: https://ggplot2-book.org/Data Visualization with R by : It’s a direct guide on building plots with
ggplot2
along with best practices for data visualizations in general. Chapters 3-6, 11, and 14 directly focus onggplot2
, its wide array of customization for the aesthetics of its plots, and best practices for creating effective and visually sound graphs. Chapter 10 is also interesting if you’d like to explore other graphs besides the conventional 2D bar plots and scatterplots you see often inggplot2
, such as dumbbell plots and heat maps. Currently, it’s only available as an online bookdown, but a book version is reportedly in the works of being available on Amazon soon.
B. Links
- Top 50 ggplot2 Visualizations Master List (With Full R Code): This list was where I got some inspiration on using
gganimate
, namely in regards to the animated bubble plot (found in the section called Correlation under the subsection Bubble plot). It covers a wide variety of data visualizations that you could employ to represent all kinds of data relationships, especially if you need ideas on what type of plot to use to represent your data. - Getting started (gganimate): On the official website of the
gganimate
package, the author goes over some of the basics on how to get started with creating basic animations usingggplot2
visualizations andgganimate
functions. You can also search this website to find more resources on the features of thegganimate
package as well as notices on major changes to it (especially with major overhauls of the API itself with its official release).