Category Archives

3 Articles

Extracting geodata from OpenStreetMap with Osmfilter

Extracting geodata from OpenStreetMap with Osmfilter

A guest post by Hans Hack

When working on map related projects, I often need specific geographical data from OpenStreetMap (OSM) from a certain area. For a recent project of mine, I needed all the roads in Germany in a useful format so I can work with them in a GIS program. So how do I do I get the data to work with? With a useful little program called Osmfilter.

Similarity and distance in data: Part 2

Similarity and distance in data: Part 2

Part 1 | Code

In part one of this tutorial, you learned about what distance and similarity mean for data and how to measure it. Now, let’s see how we can implement distance measures in R. We’re going to look at the built-in dist() function and visualize similarities with a ggplot2 tile plot, also called a heatmap.

Implementation in R: the dist() function

The simplest way to do distance measures in R is the dist() function. It works with matrices as well as data frames and has options for a lot of the measures we’ve gotten to know in the last part.

The crucial argument here is method. It has six options — actually more like four and a half, but you’ll see:


Similarity and distance in data: Part 1

Similarity and distance in data: Part 1

Part 2

In your work, you might encounter a situation where you want to analyze how similar your data points are to each other. Depending on the structure of your data though, “similar” may mean very different things. For example, if you’re working with records containing real-valued vectors, the notion of similarity has to be different than, say, for character strings or even whole documents. That’s why there’s a small collection of similarity measures to choose from, each tailored to different types of data and different purposes.