R crash course: Writing functions

R crash course: Writing functions

As you know by now, R is all about functions. In the event that there isn’t one for the exact thing you want to do, you can even write your own! Writing your own functions is a very useful way to automate your work. Once defined, it’s easy to call new functions as often as you need. It’s a good habit to get into when programming with R — and with lots of other languages as well.

Defining a function uses another function simply called function(). Function names follow pretty much the same rules as variable names, so you can call them anything that would also be acceptable as a variable name.

Let’s try an easy example to see how function definitions work:

A function of questionable usefulness: It essentially does the same thing as print(). It takes an argument called x, and prints whatever you put as x to the console.

Theoretically, you can make your function take as many arguments as you want. Just write them in the parentheses of function(). You can call the arguments however you want, too. Also, your functions will probably often require more than one line. In that case, just put whatever you want your function to do in curly brackets {}. It will look somewhat like this:

Let’s mess with that one a bit! Run the following code line by line and try to guess what went wrong.

Possible errors while writing functions

Errors aren’t just a necessary evil in coding. By making mistakes, you get to know your programming language better and find out what works — and, of course, what doesn’t work. Let’s go through the errors one by one:

  • squareadd(3): You passed the function only one argument (3, which was attributed to the “x” argument) to work with when it expected two values, one for x and one for y.
  • squareadd(3,”two”): Now you passed the function two arguments, but one’s not a number. It’s a character, since it has quotes around it. But R can’t execute the function with a character. After all, what is 3^2 + “two” supposed to mean?
  • squareadd(3,two): No quotes this time in the second argument. Because the “y” argument is not in quotes and not a number, either, R assumes it’s a variable or some other object. Problem is: R can’t find the object called two anywhere
  • After you define the object two to be equal to 2, though, R does find a matching object to put as an argument. So this time around, squareadd(3,two) should return the number 11

After we change the function definition to include only the “x” argument, the errors we get change a little. Note that we there’s still a “y” in the function body.

  • squareadd2(3,2): Other way around this time. Your function expected only one argument, but got two.
  • squareadd2(3): You passed the correct number of arguments, but R can’t find anything to use for the y in the function body, neither inside the function nor in the global environment.
  • This is why, after you defined y to be equal to four in the global environment, squareadd2(3) works fine and will return 13 (since 3^2 + 4 = 13).

Scoping Rules in R

Some of the errors you’ll get, such as those in the last two lines, are due to something called the scoping rules of R. These rules define how R looks for the variables it needs to execute a function. It does that by looking through different environments — sub-spaces of your working environment that have their own variables and object definitions — in a certain order. There’s two basic types of scoping:

  • Lexical scoping: Looking for missing objects in the environment where the function was defined.
  • Dynamic scoping: Looking for missing objects in the environment where the function was called.

R uses lexical scoping. So if it doesn’t find the stuff it needs within the function (which, incidentally, has its own little environment), it goes on to look in the environment where the function was defined. In many cases, this will be the global environment, which is what you’re coding in if you’re not inside a specific function. If it doesn’t find what it needs there either, it will continue down the search list of environments. You can take a look at the list by typing search() into your console.

Let’s take a quick look at the difference between dynamic and lexical scoping. Look at the following code and try to guess its output. Execute it in RStudio and see if you’re right.

The output depends on the scoping rules your programming languages uses. As you just learned, R uses lexical scoping. So if you call check(), a is set to FALSE only on the function environment of check(). But since istrue() was defined in the global environment, where a is still equal to TRUE, it will print “that’s right!” to your console. If R used dynamic scoping, it would go with a <- FALSE, since that is accurate for the environment where istrue() was called.

You don’t have to worry too much about the specifics of scoping rules and environments when starting to code, but it’s a useful thing to keep in mind. There’s lots of good info on scoping, searching and environments in R on the web, as well as more tutorials on writing your own functions. We’ll be putting together some resources on our website soon, so stay tuned for that.

But for now — well done! That was a lot of new info to process. print() yourself a “Good job!” to the console before you go on and practice writing some more functions. We’re looking forward to your coding experiences!

Bonus round: Can you count how often the word “function” appears in this text? Guess right and win a complimentary function congratulating you on your newly acquired coding skills.

 

{Credits for the awesome featured image go to Phil Ninh}

R exercise: Analysing data

R exercise: Analysing data

While using R for your everyday calculations is so much more fun than using your smartphone, that’s not the (only) reason we’re here. So let’s move on to the real thing: How to make data tell us a story.

First you’ll need some data. You haven’t learned how to get and clean data, yet. We’ll get to that later. For now you can practice on this data set. The data journalists at Berliner Morgenpost used it to take a closer look at refugees in Germany and kindly put the clean data set online. You can also play around with your own set of data. Feel free to look for something entertaining on the internet – or in hidden corners of your hard drive. Remember to save your data in your working directory to save yourself some unneccessary typing.

Read your data set into R with read.csv(). For this you need a .csv file. Excel sheets can easily be saved as such.

Now you have a data frame. Name it anything you want. We’ll go with data. Check out class(data). It tells you what kind of object you have before you. In this case, it should return data frame.

Time to play!

Remember, if you just type data and run that command, it will print the whole table to the console. That might be not exactly what you want if your dataset is very big. Instead, you can use the handy functions below to get an overview of your data.

Try them and play around a little bit. Found anything interesting yet? Anything odd? In the data set we suggested, you’ll notice that the mean and the median are very different in the column “Asylantraege” (applications for asylum). What does that tell you?

Row and column indices

This is how you can take a closer look at a part of the whole set using indices. Indices are the numbers or names by which R identifies the rows or columns of data.

The last two alternatives only work if your columns have names. Use the function names() to look them up or change them.

Here are some more useful functions that will give you more information about the columns you’re interested in. Try them!

Subsets and Logic

Now you can take and even more detailed look by forming subsets, parts of your data that meet certain criteria. You’ll need the following logical operators.

Try to form different subsets of your data to find out interesting stuff. Check if it worked with View()head()tail(), etc.

Try to kick out all the rows that have “0” in the column “Asylantraege” (applications for asylum). Look at it again. What happened to mean and median?

Get the answers you want

With everything you learned so far, you can start to get answers. See what questions about your data can be answered by forming data subsets. For example, if you used the data set we suggested: Where do most people seeking refuge in Germany come from?

We made a list of the ten most common countries of origin.

Unbenannt

Ask your own questions. What do you want your data to tell you?

 

{Credits for the awesome featured image go to Phil Ninh}

R crash course: Workspace, packages and data import

R crash course: Workspace, packages and data import

In this crash course section, we’ll talk about importing all sorts of data into R and installing fancy new packages. Also, we’ll learn to know our way around the workspace.

Your workspace in R is like the desk you work at. It’s where all the data, defined variables and other objects you’re currently working with are stored. Like with a desk, you might want to clean it every once in a while and throw out stuff you don’t need any more. There’s a few useful commands to help you do that. Take a look and try them out:

R crash course: Vectors

R crash course: Vectors

Now that you installed RStudio, learned about assignments and wrote some basic code, there’s nothing stopping you from becoming a journocoder!

To get a deeper understanding of how R stores your data, we’re now going to take a closer look at data structures in R, starting with a central concept: Vectors.

Working with vectors

You will work with vectors a lot in R — and I mean a lot. R loves vectors. It treats a scalar — a single value — as nothing but a vector with only one value. There’s all kinds of data structures in R, but most of them are basically just different compositions of vectors. We will get to know them better as we go along. For example, a matrix consists of a vector cut into multiple pieces of the same length. A list is a combination of vectors with different lengths and R even manages to see data frames as something made of vectors. So if you know how to handle vectors in R, that’s a good step towards coding proficiency.

Vectors are created with the c()-function. Like single values, you can name your vectors however you want and perform all kinds of calculations on them.

Elements of a vector are seperated by a comma in the c() function, but you can generate sequences of numbers in different ways. For example, if you write “1:10” instead of a value, R will add the numbers 1 through 10 to your vector. Also, instead of writing “c(3,3,2,2)”, you can tell R to repeat the numbers 3 and 2 two times each with the rep() function — like I did below with the variable h2. You can also tell R to repeat a whole sequence like with p2. Run the code below and have a closer look at the variables and the output R returns.

Try to create some vectors in different ways by yourself!

Now, define two vectors of the same length (with the same number of elements) and try to do some basic math you’ve learned in the chapter before. For example, try:

Try some more things if you want. Now go for the basic math functions:

In the last chapter I said sum(5, 4) does the same as 5+4. Is this still true when it comes to vectors? Compare the results!

Operations like sqrt() and log() can only be applied to positive values. They will work for every positive value of your vector but will give you an error message and return NaN instead of a result for the negative elements. NaN stands for “not a number”. It is possible to work with a vector containing NaNs, but you should double check if you actually want them in there.

 

Watch out!

So far for vectors of the same length. What about vectors that have a different number of elements? Try this:

Works well, hm? But why? The answer is something you should keep in mind: If (for an operation where the vectors have to be the same length) one vector is shorter than the other, R repeats the elements of the shorter vector until the two are the same length! So for “n+m”, R doesn’t calculate “(1, 2)+(4, 5, 6, 7)” but “(1, 2, 1, 2)+(4, 5, 6,7)”.

 

Interesting functions for your first data analysis

Let’s look at a few useful functions that can help you analyze vectors. Remember to use the help functions or the internet if you don’t understand a function.

But wait, there’s more: You can round vector elements or turn a vector to a matrix. Look closely at the output of this piece of code: What is the difference between C and C2? What is the difference between C2 and C3?

Functions, as you may have already noticed, can work with different parameters that determine their output. These are called arguments. They can be specified in parentheses after the function name. Here, the second argument of the matrix function tells R how many rows the matrix will have. The logical argument byrow controls in what way my matrix will be filled with the vectors elements. Because this is a crash course, we won’t go much further into vectors and matrices. But if you want to learn more about them, go for it!

 

Oo-de-lally!

At this point you know enough about programming in R to have a closer look at what’s useful for journocoding! In the meantime, it’s always a good idea to play around with what you’ve already learned!

In the next chapters, we will get to know other data structures, like lists or data frames. We will learn how to load data into the workspace, like excel sheets or csv files.  And we will have a look at the most important statistical values that are interesting for journocoders like you and how R can help you analyze and visualize your data. You will learn how to use and write functions and how to use packages in R. Sounds awesome, right? Let’s do it!

 

{Credits for the awesome featured image go to Phil Ninh}

R crash course: Getting started

At journocode, we’re starting out with an intro to the tool we rely on most right now: The statistical programming language R. “R: A Language for Data Analysis and Graphics” is mostly used in statistics, but is very useful for journalists working with data as well.

You can install R here. Since it is open source, there are tons of packages with additional functions and possibilities. We will show you how to find and install them in the next chapters.

To give you an idea of what R can do: This interactive map of Germany was made with only a few lines of R code and a package named GoogleVis.[cf]c2[/cf]

R lets you load data in lots of different formats. It can show you interesting values like the mean of your data, visualize its spread and can also easily perform more complex statistical analyses. There are even great R packages to create beautiful interactive maps and graphs. SRF Data often uses R not only to analyze, but also to visualize data. SRF reporter and coder Timo Grossenbacher even tweeted some graphics on the swiss elections directly out of the R console.

As you can see, knowing R can be quite useful for data journalists. It is completely free. Although it does not come with a fancy built-in user interface, you can easily download interfaces like RStudio (which we use here at journocode) or the web application framework Shiny, which we will get to know later on. R is a powerful tool for crunching numbers and its possibilities grow with every package that is written by the big R-community.

So let’s learn how to code with R!

 

R: Getting started

Below, you can see what my R user interface looks like. If you download RStudio, it will look exactly like this, although the default background is white. You can costumize your background and lots of other things in the settings (Tools > Global Options).

Bildschirmfoto 2015-12-17 um 03.58.19

If you download R without a user interface, you will “only” get the console without all the extra stuff there to make your work easier.

With RStudio, you have the possibility to open several scripts in the editor, where you can write edit and save your code. Then of course, there’s the R console. This is where the magic happens. The windows on the top right show your command history and the environment of the session. The “environment” is like the desk you work at. It stores all the variables you have defined and the data you imported. Your command history shows the lines of code you recently sent to the console. On the bottom right, you can look through your files, plots, installed packages et cetera.

Just install RStudio and play with the interface to get to know it a little bit.

 

Editor, console, plot window?

To open a new script, you only have to click the Bildschirmfoto 2015-12-20 um 23.54.53 symbol on the top left. The script just a text document containing your code. Theoretically, you could write it in any text editor you want. It wouldn’t be as much fun, though. The RStudio editor automatically highlights your code and indents it so it looks nice and clean.

You could also code directly in the R console. But if you have a lot of code and a bunch of different projects at the same time, the editor is very useful to organize and save your code.

If you press the Bildschirmfoto 2015-12-21 um 01.07.43 button, every line of code in the script is sent to the console at once. If you press the Bildschirmfoto 2015-12-21 um 01.07.49 button, RStudio only executes the line of code where your cursor is located. If you don’t want to click, Ctrl+ENTER (or Cmd+ENTER, if you’re on a Mac) does the same thing.

(Personally, I don’t like to use the Source button. I start with the first line of code and then click the Run button multiple times to send all my code to the console. This leads to the same result as clicking the Source button. It takes longer and is kind of foolish, but for me, it’s just a lot more fun.)

For the beginning, I recommend using the Run button. It makes it easier to understand everything the code does step by step, which is very important when learning to code. Below, you can see how the interaction of the script, the Run button, the console and the plot window works.

If you look at the environment, you see how the variables x and y are added after pressing Run. x and y, in this case, is how I named the two vectors that tell R how to plot “The house of St. Nicholas” (it’s a german thing where you try to draw a house in one continuous line… don’t worry about it).

RStudio

So now that you installed R and RStudio and know how to to use the interface, hop on to the next chapter. Let’s start coding like a boss.