In your work, you might encounter a situation where you want to analyze how similar your data points are to each other. Depending on the structure of your data though, “similar” may mean very different things. For example, if you’re working with records containing real-valued vectors, the notion of similarity has to be different than, say, for character strings or even whole documents. That’s why there’s a small collection of similarity measures to choose from, each tailored to different types of data and different purposes.
Unfortunately, data comes in all shapes and sizes. Especially when analyzing data from authorities. You’ll have to be able to deal with pdfs, fused table cells and frequent changes in terms and spelling.
While crunching numbers, a visual analysis of your data may help you get an overview of your data or compare filtered information at a glance. Aside from the built-in graphics package, R has many additional packages to help you with that.
We want to focus on ggplot2 by Hadley Wickham, which is a very nice and quite popular graphics package.
Ggplot2 is based on a kind of statistical philosophy from a book I really recommend reading. In The Grammar of Graphics, author Leland Wilkinson goes deep into the structure of quantitative plotting. As a product, he establishes a rulebook for building charts the right way. Hadley Wickham built ggplot2 to follow these aesthetics and principles.
When it comes to data journalism, visualizing your data isn’t what it’s all about. Getting and cleaning your data, analyzing and verifying your findings is way more important.
Still, an interactive eye-catcher holding interesting information will definitely not hurt your data story. Plus, you can use graphics for a visual analysis, too.
Here, we’ll show you how to build a choropleth map, where your data is visualized as colored polygon areas like countries and states.
We will code a multilayer map on Dortmunds students as an example. You’ll be able to switch between layered data from different years. The popups hold additional information on Dortmunds districts.
Data structures in R are quite different from most programming languages. Understanding them is a necessity, because they define the way you’ll work with your data. Problems in understanding data structures will probably also produce problems in your code.
As you know by now, R is all about functions. In the event that there isn’t one for the exact thing you want to do, you can even write your own! Writing your own functions is a very useful way to automate your work. Once defined, it’s easy to call new functions as often as you need. It’s a good habit to get into when programming with R — and with lots of other languages as well.
In this crash course section, we’ll talk about importing all sorts of data into R and installing fancy new packages. Also, we’ll learn to know our way around the workspace.
Your workspace in R is like the desk you work at. It’s where all the data, defined variables and other objects you’re currently working with are stored. Like with a desk, you might want to clean it every once in a while and throw out stuff you don’t need any more. There’s a few useful commands to help you do that. Take a look and try them out:
Now that you installed RStudio, learned about assignments and wrote some basic code, there’s nothing stopping you from becoming a journocoder!
To get a deeper understanding of how R stores your data, we’re now going to take a closer look at data structures in R, starting with a central concept: Vectors.