Web scraping is one of the most useful and least understood methods for journalists to gather data. It’s the thing that helps you when, in your online research, you come across information that qualifies as data, but does not have a handy “Download” button. Here’s your guide on how to get started — without any coding necessary.
On March 24th, we hosted the JournoCon 2018. A data journalism bootcamp-slash-conference for students, freelance journalists and anyone wanting to learn more about the field. To those who were there: Thank you for making it an amazing day! To those who weren’t: Here’s a recap.
We are excited to announce our first one-day data journalism boot camp/conference in Berlin! If you’re a German-speaking person interested in data-driven stuff, be sure to check it out! We offer workshops, talks and discussions around every step of the data-driven workflow.
Note: Dear English-speaking person! We’re sorry, but this event will be held in German. Stay tuned for our next events, though. There will definitely be some in English!
Soon it’s Christmas! Count down the days together with your friends from Journocode and discover a new data-driven surprise every day. This is already the second edition of our advent calendar. Time flies!
Happy holidays to all of you!
In part one of this tutorial, you learned about what distance and similarity mean for data and how to measure it. Now, let’s see how we can implement distance measures in R. We’re going to look at the built-in dist() function and visualize similarities with a ggplot2 tile plot, also called a heatmap.
In your work, you might encounter a situation where you want to analyze how similar your data points are to each other. Depending on the structure of your data though, “similar” may mean very different things. For example, if you’re working with records containing real-valued vectors, the notion of similarity has to be different than, say, for character strings or even whole documents. That’s why there’s a small collection of similarity measures to choose from, each tailored to different types of data and different purposes.
A few weeks ago, we discovered it’s possible to export WhatsApp conversation logs as a .txt file. It’s quite an interesting piece of data, so we figured, why not analyze it? So here we go: A code-along R project in two steps.
- Cleaning the data: That’s what this part is for. We’ll get the .txt file ready to be properly evaluated.
- Visualizing the data: That’s what we’ll talk about in part two — creating some interesting visuals for our chat logs.
As you know by now, R is all about functions. In the event that there isn’t one for the exact thing you want to do, you can even write your own! Writing your own functions is a very useful way to automate your work. Once defined, it’s easy to call new functions as often as you need. It’s a good habit to get into when programming with R — and with lots of other languages as well.