R crash course: Getting started
At journocode, we’re starting out with an intro to the tool we rely on most right now: The statistical programming language R. “R: A Language for Data Analysis and Graphics” is mostly used in statistics, but is very useful for journalists working with data as well.
— Timo Grossenbacher (@grssnbchr) 22. Oktober 2015
R lets you load data in lots of different formats. It can show you interesting values like the mean of your data, visualize its spread and can also easily perform more complex statistical analyses. There are even great R packages to create beautiful interactive maps and graphs. SRF Data often uses R not only to analyze, but also to visualize data. SRF reporter and coder Timo Grossenbacher even tweeted some graphics on the swiss elections directly out of the R console.
As you can see, knowing R can be quite useful for data journalists. It is completely free. Although it does not come with a fancy built-in user interface, you can easily download interfaces like RStudio (which we use here at journocode) or the web application framework Shiny, which we will get to know later on. R is a powerful tool for crunching numbers and its possibilities grow with every package that is written by the big R-community.
So let’s learn how to code with R!
R: Getting started
Below, you can see what my R user interface looks like. If you download RStudio, it will look exactly like this, although the default background is white. You can costumize your background and lots of other things in the settings (Tools > Global Options).
If you download R without a user interface, you will “only” get the console without all the extra stuff there to make your work easier.
With RStudio, you have the possibility to open several scripts in the editor, where you can write edit and save your code. Then of course, there’s the R console. This is where the magic happens. The windows on the top right show your command history and the environment of the session. The “environment” is like the desk you work at. It stores all the variables you have defined and the data you imported. Your command history shows the lines of code you recently sent to the console. On the bottom right, you can look through your files, plots, installed packages et cetera.
Just install RStudio and play with the interface to get to know it a little bit.
Editor, console, plot window?
To open a new script, you only have to click the symbol on the top left. The script just a text document containing your code. Theoretically, you could write it in any text editor you want. It wouldn’t be as much fun, though. The RStudio editor automatically highlights your code and indents it so it looks nice and clean.
You could also code directly in the R console. But if you have a lot of code and a bunch of different projects at the same time, the editor is very useful to organize and save your code.
If you press the button, every line of code in the script is sent to the console at once. If you press the button, RStudio only executes the line of code where your cursor is located. If you don’t want to click, Ctrl+ENTER (or Cmd+ENTER, if you’re on a Mac) does the same thing.
(Personally, I don’t like to use the Source button. I start with the first line of code and then click the Run button multiple times to send all my code to the console. This leads to the same result as clicking the Source button. It takes longer and is kind of foolish, but for me, it’s just a lot more fun.)
For the beginning, I recommend using the Run button. It makes it easier to understand everything the code does step by step, which is very important when learning to code. Below, you can see how the interaction of the script, the Run button, the console and the plot window works.
If you look at the environment, you see how the variables x and y are added after pressing Run. x and y, in this case, is how I named the two vectors that tell R how to plot “The house of St. Nicholas” (it’s a german thing where you try to draw a house in one continuous line… don’t worry about it).