From the data to the story: A typical ddj workflow in R

From the data to the story: A typical ddj workflow in R

R is getting more and more popular among Data Journalists worldwide, as Timo Grossenbacher from SRF Data pointed out recently in a talk at useR!2017 conference in Brussels. Working as a data trainee at Berliner Morgenpost’s Interactive Team, I can confirm that R indeed played an important role in many of our lately published projects, for example when we identified the strongholds of german parties. While we also use the software for more complex statistics from time to time, something that R helps us with on a near-daily basis is the act of cleaning, joining and superficially analyzing data. Sometimes it’s just to briefly check if there is a story hiding in the data. But sometimes, the steps you will learn in this tutorial are just the first part of a bigger, deeper data analysis.

JavaScript: Interactive Leaflet.js map with search bar

JavaScript: Interactive Leaflet.js map with search bar

In our previous posts about Leaflet.js, we coded an interactive marker map and learned how to update our data with a google spreadsheet. In this short tutorial, we will show you how to make your map searchable so users can find a specific marker.

We’ll use the code of the already finished map that’s hosted on our GitHub page so we don’t have to start at the very beginning again. Simply download the directory to your computer and open it in your text editor. If you’ve never created an interactive map with Leaflet.js before, please have a look at the two tutorials mentioned above before proceeding with this one.

HTML

Adding a search bar to the map isn’t hard at all, thanks to Italian programmer Stefano Cudini. He is the author of Leaflet.Control.Search, a JavaScript library that will do the trick for us. First, download the libraries’ JS and CSS code from his GitHub repository and save it in the map’s directory. It is currently named loop_google_spreadsheet, but you can rename it to whatever you want.

Now, open index.html in the editor and include the two files you just downloaded:

Done. Now we get to the part where we’ll change a little bit more of our map code.

JavaScript

Adding a visual search bar to the map is easy. First, we have to create a new layer which will be searchable later. Then we add the search control. This is how it’s done:

If you want a nice search icon to appear on the bar, download the images directory of Stefano Cudinis library and copy it into your map directory. Now, when you open the index file in a web browser, you should see the search bar on your map.

The search bar is useless as of now, since we haven‘t given the search bar any data to look through. So in the next step, we want to fill markersLayer with the search keywords and the markers’ location. We use the code of the Google Spreadsheet map, so we’ll load the data with Tabletop, like we did in the previous tutorial, and set the URL, size and orientation of our icons. But we’ll delete or comment out the entire part where we actually add the markers on the map:

Now, to add the markers with matching popups to the map and also to the search corpus, we have to add the markers to markersLayer. We define the value searched as the markers’ title and the found position as loc. Then we’ll save every new L.marker as marker, bind the popup and add the whole thing to our layer.

And that’s basically it! Now you can search for the newsrooms’ names and have the map zoom to their corresponding location. You can also add the snippet

to zoom back to the normal level once you close the search bar.

And this is what your result should look like. Click on the search symbol, type in “Dat” and see what happens.

As always, you’ll find the entire code for this example on our GitHub page. If you have any questions, suggestions or feedback, feel free to leave a comment! We’ll try to answer fast.

JavaScript: Loop through a Google Spreadsheet

JavaScript: Loop through a Google Spreadsheet

In our post Coding marker maps with Leaflet.js we filled a map visualization with information given directly in the code, but we also learned how to loop through data in a JSON file to create markers and pop-ups. This time, we will loop through an public Google Spreadsheet. We will do this with the same example map as in the previous post. But this little trick will not only help you with building marker maps but can come in handy for any JavaScript application you need to automate the handling of your data.

So let’s start right away with setting up a data table with Google Spreadsheets.

Adding tooltips to SVG graphics

Adding tooltips to SVG graphics

Java Script libraries and other tools offer cool ways to visualize data, but sometimes, you may want an even more customizable way of presenting a topic on the web. Maybe you already have the perfect graphic, but it’s not interactive yet. In this tutorial, we’ll show you a way to add tooltips to your SVG graphics.

One example for an interactive graphic a library can’t give you is the following site plan showing the Düsseldorfer Rhine Funfair with its attractions and food stands I built for the Rheinische Post. You can find additional information (in german) as tooltips when hovering over some of the attractions.

The basis for the map is a simple image I created with the design program Sketch. The trick I used to add tooltips to some of the map elements doesn’t require much coding and can be used to add tooltips to every website element including divs, text and pictures.

So let’s get right to it!
In this tutorial we will add tooltips to a very simple examplary SVG file you can download here. SVG stands for scaleable vector graphic. It’s the name of a file format where every image object is described not by the colour of its individual pixels, but by its basic shapes and their attributes. A big advantage SVG graphics have over other formats like PNG or JPEG ist that they’re scalable, just like their name says, so they never get blurry, no matter how much you zoom in. One way to create SVG files is to use a design program like Illustrator, Inkscape or Sketch.

Pro tip: RStudio lets you export charts in this format, too. Pimp up boring charts with some creative elements like I did with this stacked barchart on germanys reasons for trading trash:

The following screenshot shows the SVG image we are going to add tooltips to. It contains a big rectangle with a turquoise frame as the background, a red circle, a yellow star and a blue triangle.

Bildschirmfoto 2016-07-30 um 18.54.21

If you save the image as an SVG file and open it with a text editor, this is what you’ll get:

This is not the typical way you expect images to look like. And this is what makes SVG awesome: It translates your artwork into a snippet of XML code. XML files, or Extensible Markup Language files, look a lot like HTML. They’re not, though: XML was developed as a way to store data, while HTML is responsible for displaying data, mostly on the web. XML doesn’t have predefined tags like <a> or <body>, like HTML does. In XML, you’re entirely free to make up your own tags to describe different parts of your data.

SVG graphics still follow a very specific XML structure. That structure will work like a charm. Try putting the SVG code into your websites body. Set the SVG width to a 100%, delete the height settings and save the file as index.html. It should look like this:

Et voilà: The static image is ready for the web! Your browser can interpret the XML code of your SVG image easily and convert it back to a beautiful graphic.

 

Adding interactivness

Great! Now, we will use the jQuery Plugin Tooltipster to add information to this svg! The Plugin has nice presettings and a really good documentation. These three steps will lead you to your first interactive SVG:

Step 1: Download the Plugin

Bildschirmfoto 2016-07-30 um 19.42.35Create a new folder, name it svg_interactive (for example) and store the index.html file in it. You can find and download the Plugin’s scripts and style sheets on our GitHub repository or on the Plugins webpage. The main folder contains plenty of subfolders and documents you’ll barely need. For our example, we’ll only use tooltipster.bundle.min.jstooltipster.bundle.min.css and, for a little bit of extra style, the theme sheet tooltipster-sideTip-punk.min.css. You can either save the whole tooltipster folder in svg_interactive and then set the correct paths to those files in your html head, or you only store the three files you’ll actually use in your folder like we did as shown in the screenshot.

Step 2: Include the scripts

Set the paths to the style sheets and the JS script in the head of your webpage. Don’t forget to include jQuery, since Tooltipster depends on it:

Step 3: Add information to the image elements

Next, we’ll add a small snippet of Java Script code to the head manually. You could also save these few lines in an extra .js file and include them like the others, but for now, we’ll just keep it as an inline script.

This snippet defines a tooltip class with the theme tooltipster-punk and specifies that the content will be HTML. If you add

to an SVG element it will get a tooltipster-punk themed tooltip with HTML styled text. And that’s basically it!

For example, try:

Now, if you want to change the tooltip’s style you can save some CSS settings in a new stylesheet and include it in the webpages head. You can change the font family, the color of the popup background, the popup arrow or the default pink border-bottom, for example, like this:

Save the file as style.css and include it in the .html file. You can also change the trigger for opening and closing the tooltips in the manually added Java Script. You can set it to hover, click or optimize it for mobile devices with a customized trigger like this:

And this is the result: Click on it, hover over the symbols or try it on your smartphone!

So be creative with your data visualizations, even if complex Java Script libraries aren’t yours.

As always, you can get the entire code to this example on our Github Page. If you have any questions or feedback, feel free to leave a comment or mail us!

JavaScript: Coding marker maps with Leaflet.js

JavaScript: Coding marker maps with Leaflet.js

Showing locations on a map can be pretty cool to provide some context for your story or to give your reader an overview of where the story takes place. A good way to build a simple, yet responsive and professional looking map is to use the JavaScript library Leaflet.

In an earlier post, “Your first choropleth map“, we used Leaflet as well, but coded the map using the Leaflet R package, which works like a wrapper to translate the more common Leaflet functions into R syntax. It’s very useful if you’re more used to R syntax and don’t want to learn JavaScript anytime soon. But using the original JS library and coding the map with JavaScript will give us way more freedom when customizing the map, which is why we’ll try it that way today.

As an example, let’s start with a map of the locations of some data journalism newsrooms in the German speaking area. As always you can find all the code of this tutorial on our GitHub page.

This is what the finished map will look like:

R: Your first web application with shiny

R: Your first web application with shiny

Data driven journalism doesn’t necessarily involve user interaction. The analysis and its results may be enough to write a dashing article without ever mentioning a number. But let’s face it: We love to interact with data visualizations! To build those, some basic knowledge of JavaScript and HTML is usually required.
What? Your only coding skills are a bit of R? No problemo! What if I told you there was a way to interactively show users your most interesting R-results in a fancy web app?

Shiny to the rescue

Shiny is a highly customizable web application framework that turns your analysis into an interactive web app. No HTML, no JavaScript, no CSS required — although you can use it to expand your app. Also, the layout is responsive (although it’s not perfect for every phone).

In this tutorial, we will learn step by step how to code the shiny app on Germany’s air pollutants emissions that you can see below.

R: Tidy Data

R: Tidy Data

Unfortunately, data comes in all shapes and sizes. Especially when analyzing data from authorities. You’ll have to be able to deal with pdfs, fused table cells and frequent changes in terms and spelling.

When I analyzed the swiss arms export data as an intern at SRF Data, we had to work with scanned copies of data sheets that weren’t machine-readable, datasets with either french, german or french and german countrynames in the same column as well as fused cells and changing spelling of the categories.

Unsurprisingly, preparing and cleaning messy datasets is often the most time-consuming part of data analysis. Hardley Wickham, creator of  R packages like ggplot and reshape, wrote a very interesting paper about an important part of this data cleaning: the data tidying.
According to him, tidy data has a specific structure:

Each variable is a column, each observation is a row, and each type of observational unit is a table. This framework makes it easy to tidy messy datasets because only a small set of tools are needed to deal with a wide range of un-tidy datasets.

As you may have seen in our post on ggplot2, Wickham calls this tidy format molten data. The idea behind this is to facilitate the analysis procedure by minimizing the effort in preparing the data for the different tools over and over again. His suggestion: Working on tidy, molten data with a set of tidy tools, allowing you to use the saved time to focus on the results.

Bildschirmfoto 2016-02-29 um 15.02.26

Excerpt of Wickhams “Tidy Data”

Practicing data tidying

But how do we tidy messy data? How do we get from raw to molten data and what are tidy tools? Let’s practice this on a messy dataset.

On our GitHub-page, we deposited an Excel file containing some data on marriages in Germany per state and for different years. Download it and open it with Excel to have a first look at it. As you’ll see, it’s a workbook with seven sheets. We have data for 1990, 2003 and for every year from 2010 through 2014. Although this is a quite small dataset which we could tidy manually in Excel, we’ll use this to practice skills that will come in handy when it comes to bigger datasets.

Now check whether this marriage data needs to be tidied:

  • Are there any changing terms?
  • Is the spelling correct?
  • Is every column that contains numbers correctly saved as a numeric column?
  • Are there needless titles, fused cells, empty cells or other annoying noise?

Spoiler alert: The sheets on 2010-2015 are okay, but the first two — not that much. We have different spelling and terms here, as well as three useless columns and one row plus all the numbers saved as text in the first sheet. As said, the mess in this example is limited and we could tidy it up manually with a few clicks. But let’s keep in mind that we’re here to learn how to handle those problems with larger datasets as well.

Within Excel, we will:

  • Delete spare rows and columns (we could do that in R too when it comes to REALLY messy data)
  • Save columns containing numbers as numeric type

Now we’ll change to R.

First of all, we need to install and require all the packages we’re going to use. We’ll do that with an if-statement telling R only to install the package if it hasn’t been installed yet. You could of course do this in the straightforward way without the conditional statement if you remember wether you already installed the package, but this is a quick way to make sure you don’t install something twice needlessly.

To read in the sheets of an Excel workbook, read_excel() from the readxl-package is a useful function. Because we don’t want to load the sheets separately, we’re going to use a loop for this. If you’re interested in learning more about loop functions, stay tuned for our upcomming tutorial on this topic.

messy_data is now a list of seven local data frames with messy_data[[1]] containing the data for 1990, messy_data[[2]] for 2003 and so on. Also, we added a “timestamp” column to each list element which contains the index of the list element.

To save the sheets as list elements is time saving, but we want all the data in one data frame:

If you get an error telling you the frames have different lengths you probably forgot to delete the spare columns in the 1990 sheet. Sometimes there even seems to be something invisble left in empty excel columns. I usually delete three or so of the empty columns and empty rows next to my data to be sure there isn’t something left I can’t see.

Next part: Restructuring the data

With the function gather() of Wickhams tidyr-package, we’ll melt the raw data frame to convert it to a molten format. And let’s change the timestamps created by the read-in loop to the actual year (we could do that with a loop, too, but this is good enough for now).

Oo-De-Lally! This is tidy data in a tidy format! Now we can check if we have to correct the state names (because with bigger datasets, you can’t quickly check and correct spelling and term issues within Excel):

So we got 19 different german Bundesländer. But Google tells us that there are only 16 states in Germany! Let’s have a closer look at the names to check whether we’ll find duplicates:

Yes, there are! For example Baden-Württemberg and BaWü refer to the same state, as well as Hessen, Hesse and Hesssen. You can just manually correct this. For really big datasets, you could also work with regular expressions and string replacement to find the duplicates, but for now, this should be enough:

Now that your data is tidy, the actual analysis can start. A very useful package to prepare molten data is dplyr. Its functions ease filtering the data or grouping it. Not only is this great for taking a closer look at certain subsets of your data, but, because Wickhams graphics package ggplot2 was created to fit the tidy data principle, we can quickly shape the data to be visually analyzed, too.

Here we have some examples for you showing how tidy data and tidy tools can work hand in hand. If you want to learn something about the graphics package ggplot2 first, visit our post for beginners on this!

Visual analysis with ggplot2: this may look complicated at first, but once you have coded the first ggplot you only have to change or/and add a few things to create several more and totally different plots.

Bildschirmfoto 2016-03-05 um 00.15.33

Maybe you’ve got some other questions this data could answer for you? Feel free to continue this analysis or try to tidy your own data set!

If you have any questions, problems or feedback, simply leave a comment, email us or join our slack team to talk to us any time!

 

{Credits for the awesome featured image go to Phil Ninh}

R: plotting with the ggplot2 package

R: plotting with the ggplot2 package

While crunching numbers, a visual analysis of your data may help you get an overview of your data or compare filtered information at a glance. Aside from the built-in graphics package, R has many additional packages to help you with that.
We want to focus on ggplot2 by Hadley Wickham, which is a very nice and quite popular graphics package.

Ggplot2 is based on a kind of statistical philosophy from a book I really recommend reading. In The Grammar of Graphics, author Leland Wilkinson goes deep into the structure of quantitative plotting. As a product, he establishes a rulebook for building charts the right way. Hadley Wickham built ggplot2 to follow these aesthetics and principles.