library(ggplot2)
Introduction
Throughout graduate school, I began to develop a fascination with R’s ggplot2
library which contains a variety of ways you can visualize your data (many of which are too vast to summarize in a single blog); I liked that ggplot2
gives you so many ways to present your data as well as ways to enhance it to your needs and preferences, so much that I’d find every excuse to use it whenever I have data visualizations to make in class and outside of it.
This fascination with ggplot2
meant that I had to do a lot of googling and combing through numerous online resources, StackOverflow posts, and textbooks to get the answers I need; even with all the knowledge, R-related books, and so on that we have so far, it can get challenging to figure out what tools to use, which ones exist, and when to use them, given how fast R continues to evolve over time. While constantly looking up information is essential to success in any educational journey you go on, it can be frustrating to do, especially for newcomers to R that’re unsure about where to start.
Thus, for this blog, I’ll share will you some of the tips and tricks I employ to enhance my data visualizations. For simplicity’s sake, let’s dive into them with a scatterplot of data from one of the most ubiquitous and most commonly used (or overused) data sets of all time - the mtcars
data set.
Tricks for Enhancing ggplot2
Visualizations
To begin, we need to have the ggplot2
package installed already so we can load them in (which is done in the code chunk below).
If you don’t have it installed, you can get the package by running the command install.packages("ggplot2")
in your R console or scripting environment one time. Then, once done, you can now run the command in the code chunk below.
Now that we have the ggplot2
loaded, we can start creating and enhancing our own data visualizations using the mtcars
data set (which already comes installed and preloaded into R when you run it so there’s no need to load or call in the data set).
Here, we create a scatterplot of the car’s weight in tons (i.e., wt
) being plotted against the car’s respective mileage in miles per gallon (i.e., mpg
). We save this scatterplot to the object plt1
so we can further reuse and modify it later.
<- mtcars |>
plt1 ggplot(mapping = aes(x = wt, y = mpg)) +
geom_point()
plt1
With this scatterplot set up, we can now modify it and demonstrate some of tricks for enhancing them below.
TRICK 1: Modify the plot labels.
You can add labels to the plot’s title and axes as a way to thoroughly give the audience more information on what the plot’s trying to communicate.
By default, the axes are the only elements labeled (which are taken directly from the names of the column being passed in).
Using the labs()
function below, we can add onto plt1
labels for the x and y-axes.
<- plt1 +
plt1
# adding labels to axes
labs(x = "Weight (tons)",
y = "Miles Per Gallon (mpg)")
plt1
We can also put a title and (optionally) a subtitle and caption for for the plot. The title helps to indicate the subject matter of the plot and the subtitle can help provide more context to it.
As for the caption, you can use it to add a footnote to the plot, which typically has been used to indicate the source of the data set. The caption appears on the bottom right portion of the plot, but you can certainly alter where it’s positioned (as you’ll learn later on in this section).
<- plt1 +
plt1
labs(
# adding title to plot
title = "Car Weight vs Car Mileage",
# optional
subtitle = "How much does a car's weight affects its mileage?",
caption = "source: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411."
) plt1
Furthermore, you can set alt-text using the parameter alt
in labs()
to provide alternative descriptions for those using screen readers to view your scatterplots.
<- plt1 +
plt1
# modifying plot labels
labs(
# alt text
alt = "A scatterplot modeling a car's weight on the x axis (with a range from 1 ton to 6 tons) and a car's mileage on the yaxis (with a range from 10 miles per gallon to 35 miles per gallon) for 32 automobile brands with the data obtained from the 1974 Motor Trend magazine. The plot shows a negative linear association between a car's weight and a car's mileage where as the car's weight increases, its mileage generally decreases in a roughly linear fashion."
) plt1
You can also adjust the alignment of the title of the plot as well as its font size and the typeface of it (i.e., bold, italic, etc).
In fact, you can add the function theme()
as a layer on top of your ggplot
graph to make further modifications to the visual aesthetics of your plot.
Here, I’m going to modify the plot’s title and subtitle to be centered with the plot title being larger and in boldface. I’ll also make the caption smaller as well; we can achieve these text adjustments by passing the function element_text()
to plot.title
, plot.subtitle
, and plot.caption
in the theme()
function respectively.
element_text()
is a function in ggplot2
used primarly in the theme()
gpplot2
layer that allows you to control the text aesthetics of any text-related element on your graph.
<- plt1 +
plt1
# changing text of plot title and plot subtitle
theme(plot.title = element_text(hjust = 0.5, size = 24,
face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
# changing size of caption
plot.caption = element_text(size = 7)
) plt1
You can see in the plot above how much neater it looks with a resized title and properly labeled axes.
TRICK 2: Remove the major and minor gridlines
Personally, I don’t mind major and minor gridlines on the plot as they can help the audience better gauge where points are on the plot.
Nevertheless, the option to remove them is there. In the theme()
function, if you want to remove any particular element in the plot, you can set that element equal to the function element_blank()
.
Here, we remove all major gridlines and minor gridlines by respectively setting panel.grid.major = element_blank()
and panel.grid.minor = element_blank()
.
<- plt1 +
plt1 theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
plt1
Using different arguments, you can also control which specific major/minor gridlines you want to remove (i.e., the ones along the x-axis and/or on the y-axis).
To showcase all of them, I’ll recreate the same result in the previous plot, but through manually removing each gridline (major and minor) along both the x and y axes. Depending on your plotting needs, you can choose to use either, some, or all four of the arguments (see code below).
As before, if you want to remove any particular element on your plot (especially an element not turned off by default such as axes tick marks, for example), identify the parameter in theme()
you want to modify (i.e., panel.grid.major
, axis.line.x
, etc) and set it equal to element_blank()
or NULL
.
+
plt1 theme(
# removing major and minor gridlines for x-axis
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
# removing major and minor gridlines for y-axis
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank()
)
TRICK 3: Add in borders around the plotting area
Adding borders alongside the plotting area can help separate the axes of the plot from the area where the data points are plotted for ease of readability.
To add borders alongside the entire plotting area using theme()
, we can go to the parameter panel.border
and pass in an element_rect()
object to add in a border; for this object (as with element_text
, element_line
, and other element-related objects), you can modify its color and size.
Here, for the element_rect()
function passed into the parameter panel.border
, we have color = "black"
, fill = NA
(to keep the element_rect()
from coloring in the rectangle and removing all the data points), and size = 0.75
.
<- plt1 +
plt1 theme(panel.border = element_rect(color = "black",
fill = NA,
size = 0.75))
Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
plt1
TRICK 4: Change the panel background color.
As you can see below, the background of the plotting area or panel is grey, which can be changed.
plt1
In the theme()
function, you can change the background color of the plotting area by passing the element_rect()
function to the parameter panel.background
, which ONLY controls the appearance of the plotting area and NOT the area outside the plotting area itself.
In the element_rect()
function, you pass the argument fill = "white"
into it to make the plotting area (i.e., the “panel”) white (see code and results below).
+
plt1 theme(panel.background = element_rect(fill = "white"))
In general, if you want to make the background (or any element of the graph) a different color, replace "white"
in the argument fill = "white"
(or whichever argument that controls color for that element such as color
for example) with any base color (i.e., "black"
, "blue"
, "green"
, etc) or the hex code of a custom color (i.e., looks something like "#9e8d47"
for example).
You can use this color picker here to find the hex codes of custom colors you want to incorporate: https://htmlcolorcodes.com/color-picker/
Conclusion
Overall, these are some options I’ve used in the past that you can use to great effect to modify or enhance your ggplot2
data visualizations. There are certainly a lot more beyond what I can cover in this blog post, but I do hope they prove useful to you moving forward.
Should you ever want to learn more about ggplot2
’s wide array of customization features and functions, I recommend checking out some of the resources I listed in the Resources section below.
Otherwise, thank you for reading this blog and stay tuned for more from The R Files blog.
Resources
I recommend the following resources for enhancing your exploration and experimentation with the ggplot2
library.
R for Data Science: This book (now in it’s second edition) is a classic and covers some of the bare essentials needed to work with and display all kinds of data as well as strategies for writing clean code. Chapters 10-12 are relevant for those focused on data visualization in general as well as Chapters 2-9 for generally good practices for writing and maintaining clean code. You can read it online here: https://r4ds.hadley.nz/
ggplot2: Elegant Graphics for Data Analysis: This is a good book that explains the grammar of graphics of
ggplot2
and how it works under the surface. Chapters 3-5, 8, 9, 10-14, and 17 are some chapters I recommend for understanding more of the basics ofggplot2
’s plotting functions, the steps for makingggplot2
graphs in general, and some of the ways in which the plot aesthetics are made and how they can be modified. You can read the work-in-progress version online here: https://ggplot2-book.org/Data Visualization with R by Rob: It’s a direct guide on building plots with
ggplot2
along with best practices for data visualizations in general. Chapters 3-6, 11, and 14 directly focus onggplot2
, its wide array of customization for the aesthetics of its plots, and best practices for creating effective and visually sound graphs. Chapter 10 is also interesting if you’d like to explore other graphs besides the conventional 2D bar plots and scatterplots you see often inggplot2
, such as dumbbell plots and heat maps. Currently, it’s only available as an online bookdown, but a book version is reportedly in the works of being available on Amazon soon.