Similarity and distance in data: Part 2

Similarity and distance in data: Part 2

Part 1 | Code

In part one of this tutorial, you learned about what distance and similarity mean for data and how to measure it. Now, let’s see how we can implement distance measures in R. We’re going to look at the built-in dist() function and visualize similarities with a ggplot2 tile plot, also called a heatmap.

Implementation in R: the dist() function

The simplest way to do distance measures in R is the dist() function. It works with matrices as well as data frames and has options for a lot of the measures we’ve gotten to know in the last part.

The crucial argument here is method. It has six options — actually more like four and a half, but you’ll see:

Project: Visualizing WhatsApp chat logs –
Part 1: Cleaning the data

by Kira Schacht 1 Comment
Project: Visualizing WhatsApp chat logs – <br>Part 1: Cleaning the data

Part 2 | Code

A few weeks ago, we discovered it’s possible to export WhatsApp conversation logs as a .txt file. It’s quite an interesting piece of data, so we figured, why not analyze it? So here we go: A code-along R project in two steps.

  1. Cleaning the data: That’s what this part is for. We’ll get the .txt file ready to be properly evaluated.
  2. Visualizing the data: That’s what we’ll talk about in part two — creating some interesting visuals for our chat logs.

R exercise: Analysing data

R exercise: Analysing data

While using R for your everyday calculations is so much more fun than using your smartphone, that’s not the (only) reason we’re here. So let’s move on to the real thing: How to make data tell us a story.

First you’ll need some data. You haven’t learned how to get and clean data, yet. We’ll get to that later. For now you can practice on this data set. The data journalists at Berliner Morgenpost used it to take a closer look at refugees in Germany and kindly put the clean data set online. You can also play around with your own set of data. Feel free to look for something entertaining on the internet – or in hidden corners of your hard drive. Remember to save your data in your working directory to save yourself some unneccessary typing.