R: plotting with the ggplot2 package

R: plotting with the ggplot2 package

While crunching numbers, a visual analysis of your data may help you get an overview of your data or compare filtered information at a glance. Aside from the built-in graphics package, R has many additional packages to help you with that.
We want to focus on ggplot2 by Hadley Wickham, which is a very nice and quite popular graphics package.

Ggplot2 is based on a kind of statistical philosophy from a book I really recommend reading. In The Grammar of Graphics, author Leland Wilkinson goes deep into the structure of quantitative plotting. As a product, he establishes a rulebook for building charts the right way. Hadley Wickham built ggplot2 to follow these aesthetics and principles.

To imitate the inherent structure of graphics Wilkinson describes, a ggplot is constructed piece by piece: First, you create the canvas and axes for your chart, then you determine the type of chart and its specific look, like colors and legends. The range of charts ggplot2 offers is rather big: from normal barplots and tileplots to even maps. There’s a bunch of different, independent components you can choose from. And, of course, ggplot follows Wickhams understanding of tidy data. In my first week as an intern at the SRF Data Team, I analyzed data on the swiss arms exports according to these principles. We mostly used graphics for data analysis and for the fact sheet we would give to the other editors.

So why is ggplot a good choice? Because it’s fast, clear and by default good looking. So let’s start ggplotting!

Generating data

Earlier, we noted that ggplot2 wants to be given tidy data. In the next post, we will show you how to convert your messy numbers into a well structured form and what tidy data really means, so stay tuned for this important topic. For now, we’ll start by simply generating a random dataset for our plots:

This data frame looks odd, doesn’t it? Instead of a wide table we built a condensed, long data frame. Wickham calls this format molten data.
Bildschirmfoto 2016-02-29 um 15.02.26

Excerpt from Wickhams “Tidy Data”

Ggplotting

Say we want to compare the sum of values for each year, but also see the amounts of A, B, C and D. How do we do that? With a stacked barplot in ggplot2.

First of all, let’s install the package. The following function only installs and loads the package if you haven’t done it yet.

The function basically tells R “if ggplot2 isn’t required yet, please install and require it now”.

So now we’re going through this step by step. For the whole code without my nasty explanations, visit the ggplotting repository on our GitHub-Page.

Every ggplot starts with the function ggplot(). Within this function, we specify the data we want to plot. aes() is short for aesthetics and holds additional plotting information like which values should be plotted to the axes. The argument fill, in the case of a bar plot, tells ggplot what variables are used to colorize the bars. This works for areaplots as well, but is replaced with colour() for line- and scatterplots. You can use colour() for barplots, too, but this doesn’t fill the bars but change the color of their edges.

On its own, this function doesn’t do anything visual yet. It initializes a plot, but we haven’t told ggplot what type of plot we actually want. So now we have to add more components. Like, what kind of plot did we want to do again?

ggplot2 offers a range of possibilities for this: geom_area, geom_line and geom_point are just a few of them. For our stacked bar chart, we’ll use geom_bar and set stat = “identity”. stat defines the statistical transformation and is default setted on “bin”. This makes the height of each bar equal to the number of cases in each group. If you use this setting, you can’t specify the y argument in aes(). If you want the height of your bars to be defined by a column in your dataset, use stat = “identity” instead to map a value to the y aesthetic.

Ggplot components are linked with the “+” operator. It’s pretty intuitive: It tells R that all functions linked by a “+” belong to the same plot. Let’s try it:

Oo-De-Lally! Doesn’t it look awesome?Bildschirmfoto 2016-03-01 um 13.18.49

Well, yes. That’s nice. But we want more! What about a title or a different theme?

We will name each intermediate step individually to watch the differences the added bricks make step by step:

Hell yeah! A customized ggplot! Now that we’re getting the hang of ggplot2, let’s build some other plots by simply changing some arguments:

Area Plot

Try commenting out some components (like coord_flip() for example) or change values like vjust to check what they do.

Line Plot

Awwww yiss! Now feel free to change some arguments, add new functions and try out all the other ggplots! If you have any problems, questions or feedback simply leave a comment!

 

{Credits for the awesome featured image go to Phil Ninh}

Comments ( 5 )

  1. R: Tidy Data | Journocode
    […] you may have seen in our post on ggplot2, Wickham calls this tidy format molten data. The idea behind this is to facilitate the analysis […]
  2. Journocode-Beitrag: Tidy Data (R) – Datentäter
    […] you may have seen in our post on ggplot2, Wickham calls this tidy format molten data. The idea behind this is to facilitate the analysis […]
  3. Journocode-Beitrag: Your first web application with shiny (R) – Datentäter
    […] starting to code the app, you might want to have a look at our tutorial on the graphic package ggplot2 and our guide to tidy data, since we will use some of the functions and principles for it. If […]
  4. R: Your first web application with shiny | Journocode
    […] starting to code the app, you might want to have a look at our tutorial on the graphic package ggplot2 and our guide to tidy data, since we will use some of the functions and principles for it. If […]
  5. ReplyCLAIR
    Awesome job! One of the most comprehensive tutorials on ggplot2 I ve ever come through. By the way, how can you increase the size of x- y-axis legend? Sorry if I missed it in your post. Thank you so much!

Leave a reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>