R exercise: Analysing data

R exercise: Analysing data

While using R for your everyday calculations is so much more fun than using your smartphone, that’s not the (only) reason we’re here. So let’s move on to the real thing: How to make data tell us a story.

First you’ll need some data. You haven’t learned how to get and clean data, yet. We’ll get to that later. For now you can practice on this data set. The data journalists at Berliner Morgenpost used it to take a closer look at refugees in Germany and kindly put the clean data set online. You can also play around with your own set of data. Feel free to look for something entertaining on the internet – or in hidden corners of your hard drive. Remember to save your data in your working directory to save yourself some unneccessary typing.

Read your data set into R with read.csv(). For this you need a .csv file. Excel sheets can easily be saved as such.

Now you have a data frame. Name it anything you want. We’ll go with data. Check out class(data). It tells you what kind of object you have before you. In this case, it should return data frame.

Time to play!

Remember, if you just type data and run that command, it will print the whole table to the console. That might be not exactly what you want if your dataset is very big. Instead, you can use the handy functions below to get an overview of your data.

Try them and play around a little bit. Found anything interesting yet? Anything odd? In the data set we suggested, you’ll notice that the mean and the median are very different in the column “Asylantraege” (applications for asylum). What does that tell you?

Row and column indices

This is how you can take a closer look at a part of the whole set using indices. Indices are the numbers or names by which R identifies the rows or columns of data.

The last two alternatives only work if your columns have names. Use the function names() to look them up or change them.

Here are some more useful functions that will give you more information about the columns you’re interested in. Try them!

Subsets and Logic

Now you can take and even more detailed look by forming subsets, parts of your data that meet certain criteria. You’ll need the following logical operators.

Try to form different subsets of your data to find out interesting stuff. Check if it worked with View()head()tail(), etc.

Try to kick out all the rows that have “0” in the column “Asylantraege” (applications for asylum). Look at it again. What happened to mean and median?

Get the answers you want

With everything you learned so far, you can start to get answers. See what questions about your data can be answered by forming data subsets. For example, if you used the data set we suggested: Where do most people seeking refuge in Germany come from?

We made a list of the ten most common countries of origin.


Ask your own questions. What do you want your data to tell you?


{Credits for the awesome featured image go to Phil Ninh}

Comments ( 4 )

  1. Data structures in R: A basic crash course | Journocode
    […] where all elements have the same length. You’ve already gotten to know them a bit in our exercise on analyzing […]
  2. Replyfuphil
    Hi, ich verfolge diese kleinen Tutorials interessiert und bin gespannt wo mich das hinführt. Bei den Indizes kommt bei mir die WarnungIn mean.default(data$Asylantraege) : Argument ist weder numerisch noch boolesch: gebe NA zurück > min(data$Asylantraege) [1] Inf Warning message: In min(data$Asylantraege) : no non-missing arguments to min; returning InfWoran könnte das liegen?(English: I get the warning above when I run the indicies. Does smb. know the solution?)
    • ReplyKira Schacht
      Hi. Das sieht so aus als sei da ein Problem mit dem Vektor data$Asylantraege. Vielleicht ist mit dem einlesen der Daten etwas schiefgegangen? min(numeric(0)) gibt die gleiche Warnung aus; das heißt, R ist der Ansicht, data$Asylantraege enthielte gar keine Zahlenwerte. Überprüf mal mit head(data$Asylantraege), ob die Zahlen richtig aussehen, oder mit str(data$Asylantraege), welches Format dein Vektor hat (sollte numerisch sein). Wenn da etwas nicht richtig aussieht, versuch mal, die Daten neu einzulesen, und überprüfe, ob die Argumente für read.csv() stimmen. Mehr Infos dazu gibts auch in unserem data import tutorial. Ich hoffe, das hilft :)

      Looks like there might be a problem with the vector data$Asylantraege. Maybe something went wrong while reading in the data? min(numeric(0)) returns the same warning; so R thinks your vector does not contain any numeric elements. Try checking head(data$Asylantraege) to see if the numbers look right, or str(data$Asylantraege) to look at the format of the vector (should be numeric). If something doesn't look right there, try reading the data in again and double check the arguments to read.csv(). A more detailed look at that function can be found in our data import tutorial.
  3. ReplyBenzler
    Ich arbeite mich auch gerade durch die Tutorien und bin sehr zufrieden. Ein super Einstieg. Das Problem mit "data$Asylantraege" besteht darin, dass in der ersten Zeile die Namen mit den deutschen Umlauten stehen. Das führt beim Import dazu, dass die Variable z.B. "data$Asylanträge" genannt wird. Das dann auch mitten in die Problematik mit dem Encoding. In der Art und Weise wie es im Kurs beschrieben wurde, wird das "ä" in ein Sonderzeichen umgewandelt. Man kann das beheben, indem man a) die "ä"s und "ü"s in der ersten Zeile der Datendatei durch "ae" und "ue" ersetzt oder b) die Daten in RStudio über das Menü "File">"Import Dataset...">"From CSV" importiert und dort darauf achtet, dass UTF-8 als Encoding verwendet wird. English: I'm also working through the material. It's a great start. The problem with "data$Asylantraege" is based on the first line in the data file. The column names contain the German Umlauts "ä" and "ü". I.e. during the import the variables are named e.g. "data$Asylanträge". This implies the problems with encoding. The way described in the course lead to the special characters in the variable name. In order to solve the issue you can a) replace the "ä"s and "ü"s in the input file by "ae" and "ue" respectively, or b) import the data via the RStudio menu "File">"Import Dataset...">From CSV" and make sure that UTF-8 is used as encoding.

Leave a Reply to Benzler Cancel reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.