A web scraping toolkit for journalists

A web scraping toolkit for journalists

Web scraping is one of the most useful and least understood methods for journalists to gather data. It’s the thing that helps you when, in your online research, you come across information that qualifies as data, but does not have a handy “Download” button. Here’s your guide on how to get started — without any coding necessary.

Note: This tutorial was first published in our data-driven Advent calendar 2018 behind door number 13. You can check it out here.

The Data-driven Advent Calendar 2018

The Data-driven Advent Calendar 2018
Data-driven Advent Calendar 2018

Soon it’s Christmas! After two data-driven Advent calendars (2016 & 2017), this is our third edition. This year we’ve also come up with something special: We’ll have texts by awesome guest authors for you every day! The best data-driven journalism projects of the year, interviews and “Tutorial Thursdays” with tips and tricks from us squirrels! 🐿☃🎁

Go to Project

Scraping for everyone

Scraping for everyone

by Sophie Rotgeri, Moritz Zajonz and Elena Erdmann

One of the most important skills for data journalists is scraping. It allows us to download any data that is openly available online as part of a website, even when it’s not supposed to be downloaded: may it be information about the members of parliament or – as in our christmas-themed example – a list of christmas markets in Germany.

Extracting geodata from OpenStreetMap with Osmfilter

Extracting geodata from OpenStreetMap with Osmfilter

A guest post by Hans Hack

When working on map related projects, I often need specific geographical data from OpenStreetMap (OSM) from a certain area. For a recent project of mine, I needed all the roads in Germany in a useful format so I can work with them in a GIS program. So how do I do I get the data to work with? With a useful little program called Osmfilter.

From the data to the story: A typical ddj workflow in R

From the data to the story: A typical ddj workflow in R

R is getting more and more popular among Data Journalists worldwide, as Timo Grossenbacher from SRF Data pointed out recently in a talk at useR!2017 conference in Brussels. Working as a data trainee at Berliner Morgenpost’s Interactive Team, I can confirm that R indeed played an important role in many of our lately published projects, for example when we identified the strongholds of german parties. While we also use the software for more complex statistics from time to time, something that R helps us with on a near-daily basis is the act of cleaning, joining and superficially analyzing data. Sometimes it’s just to briefly check if there is a story hiding in the data. But sometimes, the steps you will learn in this tutorial are just the first part of a bigger, deeper data analysis.

JavaScript: Loop through a Google Spreadsheet

JavaScript: Loop through a Google Spreadsheet

In our post Coding marker maps with Leaflet.js we filled a map visualization with information given directly in the code, but we also learned how to loop through data in a JSON file to create markers and pop-ups. This time, we will loop through an public Google Spreadsheet. We will do this with the same example map as in the previous post. But this little trick will not only help you with building marker maps but can come in handy for any JavaScript application you need to automate the handling of your data.

So let’s start right away with setting up a data table with Google Spreadsheets.