ggExtra - Add marginal histograms to ggplot2, and more ggplot2 enhancements
Copyright 2016 Dean Attali. Licensed under
the MIT license.
ggExtra is a collection of functions and layers to enhance ggplot2.
The flagship function is ggMarginal, which can be used to add marginal
histograms/boxplots/density plots to ggplot2 scatterplots. You can view
a live interactive
demo to test it
out!
Most other functions/layers are quite simple but are useful because they
are fairly common ggplot2 operations that are a bit verbose.
This is an instructional document, but I also wrote a blog
post about the
reasoning behind and development of this package.
Note: it was brought to my attention that several years ago there was a
different package called ggExtra, by Baptiste (the author of
gridExtra). That old ggExtra package was deleted in 2011 (two years
before I even knew what R is!), and this package has nothing to do with
the old one.
Installation
ggExtra is available through both CRAN and GitHub.
To install the CRAN version:
install.packages("ggExtra")
To install the latest development version from GitHub:
ggExtra comes with an addin for ggMarginal(), which lets you
interactively add marginal plots to a scatter plot. To use it, simply
highlight the code for a ggplot2 plot in your script, and select
ggplot2 Marginal Plots from the RStudio Addins menu. Alternatively,
you can call the addin directly by calling ggMarginalGadget(plot) with
a ggplot2 plot.
Usage
We’ll first load the package and ggplot2, and then see how all the
functions work.
library("ggExtra")
library("ggplot2")
ggMarginal - Add marginal histograms/boxplots/density plots to ggplot2 scatterplots
ggMarginal() is an easy drop-in solution for adding marginal density
plots/histograms/boxplots to a ggplot2 scatterplot. The easiest way to
use it is by simply passing it a ggplot2 scatter plot, and
ggMarginal() will add the marginal plots.
As a simple first example, let’s create a dataset with 500 points where
the x values are normally distributed and the y values are uniformly
distributed, and plot a simple ggplot2 scatterplot.
That was easy. Notice how the syntax does not follow the standard
ggplot2 syntax - you don’t “add” a ggMarginal layer with
p1 + ggMarginal(), but rather ggMarginal takes the object as an
argument and returns a different object. This means that you can use
magrittr pipes, for example p1 %>% ggMarginal().
Let’s make the text a bit larger to make it easier to see.
Notice how the marginal plots occupy the correct space; even when the
main plot’s points are pushed to the right because of larger text or
longer axis labels, the marginal plots automatically adjust.
If your scatterplot has a factor variable mapping to a colour (ie.
points in the scatterplot are colour-coded according to a variable in
the data, by using aes(colour = ...)), then you can use
groupColour = TRUE and/or groupFill = TRUE to reflect these
groupings in the marginal plots. The result is multiple marginal plots,
one for each colour group of points. Here’s an example using the iris
dataset.
There are several more parameters, here is an example with a few more
being used. Note that you can use any parameters that the geom_XXX()
layers accept, such as col and fill, and they will be passed to
these layers.
ggMarginal(p1, margins = "x", size = 2, type = "histogram",
col = "blue", fill = "orange")
In the above example, size = 2 means that the main scatterplot should
occupy twice as much height/width as the margin plots (default is 5).
The col and fill parameters are simply passed to the ggplot layer
for both margin plots.
If you want to specify some parameter for only one of the marginal
plots, you can use the xparams or yparams parameters, like this:
ggMarginal(p1, type = "histogram", xparams = list(binwidth = 1, fill = "orange"))
Last but not least - you can also save the output from ggMarginal()
and display it later. (This may sound trivial, but it was not an easy
problem to solve - see this
discussion).
p <- ggMarginal(p1)
p
You can also create marginal box plots and violin plots. For more
information, see ?ggExtra::ggMarginal.
Using ggMarginal() in R Notebooks or Rmarkdown
If you try including a ggMarginal() plot inside an R Notebook or
Rmarkdown code chunk, you’ll notice that the plot doesn’t get output. In
order to get a ggMarginal() to show up in an these contexts, you need
to save the ggMarginal plot as a variable in one code chunk, and
explicitly print it using the grid package in another chunk, like
this:
```{r}
library(ggplot2)
library(ggExtra)
p <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
p <- ggMarginal(p)
```
```{r}
grid::grid.newpage()
grid::grid.draw(p)
```
removeGrid - Remove grid lines from ggplot2
This is just a convenience function to save a bit of typing and
memorization. Minor grid lines are always removed, and the major x or y
grid lines can be removed as well (default is to remove both).
removeGridX is a shortcut for removeGrid(x = TRUE, y = FALSE), and
removeGridY is similarly a shortcut for…
.
Often times it is useful to rotate the x axis labels to be vertical if
there are too many labels and they overlap. This function accomplishes
that and ensures the labels are horizontally centered relative to the
tick line.
This is a convenience function to quickly plot a bar plot of count
(frequency) data. The input must be either a frequency table (obtained
with base::table) or a data.frame with 2 columns where the first
column contains the values and the second column contains the counts.
Visual tests (comparing plot output against expected snapshots) are essential for ensuring ggMarginal() produces correct plots. However, these tests are very sensitive to operating system and environment differences, which can cause test failures even when plots seem to be visually identical. To solve this, we use Docker containers to provide a reproducible testing environment with pinned package versions.
When tests are run outside of Docker, the visual tests will be skipped by default unless you set the RunVisualTests environment variable to "yes". To run the visual tests, first build a Docker container and then run it:
If you want to run the tests against the latest version of {ggplot2}, simply omit the build argument from the docker build command.
When adding new ggMarginal() tests (in tests/testthat/helper-funs.R), you’ll need to first generate the expected plots because there is nothing to test against yet. Use the same docker build command, but add an argument to the run command that will mount the container’s folder onto your file system:
docker run --rm -v "$(pwd):/pkg" ggextra-test
Now the new snapshots will be created in your computer and you can review them to make sure they look correct. If you’re happy with them, you can commit them to GitHub, and any further tests will use these images as the expectation.
On GitHub, the visual tests are run against a few {ggplot2} versions that are defined in the GitHub Action workflow.
ggExtra - Add marginal histograms to ggplot2, and more ggplot2 enhancements
ggExtrais a collection of functions and layers to enhance ggplot2. The flagship function isggMarginal, which can be used to add marginal histograms/boxplots/density plots to ggplot2 scatterplots. You can view a live interactive demo to test it out!Most other functions/layers are quite simple but are useful because they are fairly common ggplot2 operations that are a bit verbose.
This is an instructional document, but I also wrote a blog post about the reasoning behind and development of this package.
Note: it was brought to my attention that several years ago there was a different package called
ggExtra, by Baptiste (the author ofgridExtra). That oldggExtrapackage was deleted in 2011 (two years before I even knew what R is!), and this package has nothing to do with the old one.Installation
ggExtrais available through both CRAN and GitHub.To install the CRAN version:
To install the latest development version from GitHub:
Marginal plots RStudio addin/gadget
ggExtracomes with an addin forggMarginal(), which lets you interactively add marginal plots to a scatter plot. To use it, simply highlight the code for a ggplot2 plot in your script, and select ggplot2 Marginal Plots from the RStudio Addins menu. Alternatively, you can call the addin directly by callingggMarginalGadget(plot)with a ggplot2 plot.Usage
We’ll first load the package and ggplot2, and then see how all the functions work.
ggMarginal- Add marginal histograms/boxplots/density plots to ggplot2 scatterplotsggMarginal()is an easy drop-in solution for adding marginal density plots/histograms/boxplots to a ggplot2 scatterplot. The easiest way to use it is by simply passing it a ggplot2 scatter plot, andggMarginal()will add the marginal plots.As a simple first example, let’s create a dataset with 500 points where the x values are normally distributed and the y values are uniformly distributed, and plot a simple ggplot2 scatterplot.
And now to add marginal density plots:
That was easy. Notice how the syntax does not follow the standard ggplot2 syntax - you don’t “add” a ggMarginal layer with
p1 + ggMarginal(), but rather ggMarginal takes the object as an argument and returns a different object. This means that you can use magrittr pipes, for examplep1 %>% ggMarginal().Let’s make the text a bit larger to make it easier to see.
Notice how the marginal plots occupy the correct space; even when the main plot’s points are pushed to the right because of larger text or longer axis labels, the marginal plots automatically adjust.
If your scatterplot has a factor variable mapping to a colour (ie. points in the scatterplot are colour-coded according to a variable in the data, by using
aes(colour = ...)), then you can usegroupColour = TRUEand/orgroupFill = TRUEto reflect these groupings in the marginal plots. The result is multiple marginal plots, one for each colour group of points. Here’s an example using the iris dataset.You can also show histograms instead.
There are several more parameters, here is an example with a few more being used. Note that you can use any parameters that the
geom_XXX()layers accept, such ascolandfill, and they will be passed to these layers.In the above example,
size = 2means that the main scatterplot should occupy twice as much height/width as the margin plots (default is 5). Thecolandfillparameters are simply passed to the ggplot layer for both margin plots.If you want to specify some parameter for only one of the marginal plots, you can use the
xparamsoryparamsparameters, like this:Last but not least - you can also save the output from
ggMarginal()and display it later. (This may sound trivial, but it was not an easy problem to solve - see this discussion).You can also create marginal box plots and violin plots. For more information, see
?ggExtra::ggMarginal.Using
ggMarginal()in R Notebooks or RmarkdownIf you try including a
ggMarginal()plot inside an R Notebook or Rmarkdown code chunk, you’ll notice that the plot doesn’t get output. In order to get aggMarginal()to show up in an these contexts, you need to save the ggMarginal plot as a variable in one code chunk, and explicitly print it using thegridpackage in another chunk, like this:removeGrid- Remove grid lines from ggplot2This is just a convenience function to save a bit of typing and memorization. Minor grid lines are always removed, and the major x or y grid lines can be removed as well (default is to remove both).
removeGridXis a shortcut forremoveGrid(x = TRUE, y = FALSE), andremoveGridYis similarly a shortcut for… .For more information, see
?ggExtra::removeGrid.rotateTextX- Rotate x axis labelsOften times it is useful to rotate the x axis labels to be vertical if there are too many labels and they overlap. This function accomplishes that and ensures the labels are horizontally centered relative to the tick line.
For more information, see
?ggExtra::rotateTextX.plotCount- Plot count data with ggplot2This is a convenience function to quickly plot a bar plot of count (frequency) data. The input must be either a frequency table (obtained with
base::table) or a data.frame with 2 columns where the first column contains the values and the second column contains the counts.An example using a table:
An example using a data.frame:
For more information, see
?ggExtra::plotCount.Testing
ggMarginal()visual outputVisual tests (comparing plot output against expected snapshots) are essential for ensuring
ggMarginal()produces correct plots. However, these tests are very sensitive to operating system and environment differences, which can cause test failures even when plots seem to be visually identical. To solve this, we use Docker containers to provide a reproducible testing environment with pinned package versions.When tests are run outside of Docker, the visual tests will be skipped by default unless you set the
RunVisualTestsenvironment variable to"yes". To run the visual tests, first build a Docker container and then run it:If you want to run the tests against the latest version of {ggplot2}, simply omit the build argument from the
docker buildcommand.When adding new
ggMarginal()tests (intests/testthat/helper-funs.R), you’ll need to first generate the expected plots because there is nothing to test against yet. Use the samedocker buildcommand, but add an argument to theruncommand that will mount the container’s folder onto your file system:Now the new snapshots will be created in your computer and you can review them to make sure they look correct. If you’re happy with them, you can commit them to GitHub, and any further tests will use these images as the expectation.
On GitHub, the visual tests are run against a few {ggplot2} versions that are defined in the GitHub Action workflow.