R exercise: Analysing data

R exercise: Analysing data

While using R for your everyday calculations is so much more fun than using your smartphone, that’s not the (only) reason we’re here. So let’s move on to the real thing: How to make data tell us a story.

First you’ll need some data. You haven’t learned how to get and clean data, yet. We’ll get to that later. For now you can practice on this data set. The data journalists at Berliner Morgenpost used it to take a closer look at refugees in Germany and kindly put the clean data set online. You can also play around with your own set of data. Feel free to look for something entertaining on the internet – or in hidden corners of your hard drive. Remember to save your data in your working directory to save yourself some unneccessary typing.

Read your data set into R with read.csv(). For this you need a .csv file. Excel sheets can easily be saved as such.

Now you have a data frame. Name it anything you want. We’ll go with data. Check out class(data). It tells you what kind of object you have before you. In this case, it should return data frame.

Time to play!

Remember, if you just type data and run that command, it will print the whole table to the console. That might be not exactly what you want if your dataset is very big. Instead, you can use the handy functions below to get an overview of your data.

Try them and play around a little bit. Found anything interesting yet? Anything odd? In the data set we suggested, you’ll notice that the mean and the median are very different in the column “Asylantraege” (applications for asylum). What does that tell you?

Row and column indices

This is how you can take a closer look at a part of the whole set using indices. Indices are the numbers or names by which R identifies the rows or columns of data.

The last two alternatives only work if your columns have names. Use the function names() to look them up or change them.

Here are some more useful functions that will give you more information about the columns you’re interested in. Try them!

Subsets and Logic

Now you can take and even more detailed look by forming subsets, parts of your data that meet certain criteria. You’ll need the following logical operators.

Try to form different subsets of your data to find out interesting stuff. Check if it worked with View()head()tail(), etc.

Try to kick out all the rows that have “0” in the column “Asylantraege” (applications for asylum). Look at it again. What happened to mean and median?

Get the answers you want

With everything you learned so far, you can start to get answers. See what questions about your data can be answered by forming data subsets. For example, if you used the data set we suggested: Where do most people seeking refuge in Germany come from?

We made a list of the ten most common countries of origin.


Ask your own questions. What do you want your data to tell you?


{Credits for the awesome featured image go to Phil Ninh}