Your first interactive choropleth map with R

Your first interactive choropleth map with R

When it comes to data journalism, visualizing your data isn’t what it’s all about. Getting and cleaning your data, analyzing and verifying your findings is way more important.

Still, an interactive eye-catcher holding interesting information will definitely not hurt your data story. Plus, you can use graphics for a visual analysis, too.

Here, we’ll show you how to build a choropleth map, where your data is visualized as colored polygon areas like countries and states.
We will code a multilayer map on Dortmunds students as an example. You’ll be able to switch between layered data from different years. The popups hold additional information on Dortmunds districts.

Now for the data

First of all you need to read a kml-file into R. KML stands for Keyhole Markup Language and as I just learned from the comment section of this tutorial it is a XML data format used to display geospatial information in a web browser. With a bit of googling, you’ll find kml-files holding geospatial informations of your own city, state or country. For this example, we’ll use this data on Dortmunds districts. Right click the link and save the file. Download the kml-file and save it to a new directory named “journocode” (or anything you want, really, but we’ll work with this for now).

Start RStudio. If you haven’t installed it yet, have a look at our first R Tutorial post. After starting RStudio, open a new R script and save it to the right directory. For example, if your “journocode”-directory was placed on your desktop (and your Username was MarieLou), type

Remember to use a normal slash (/) in your file path instead of a backslash. Now, we can read the shape file directly into R. If you don’t use our example data, try open your kml-file with a text editor first to look for the layer name! As you can see on this screenshot, for “Statistische Bezirke.kml” we have a layer named “Statistische_Bezirke”, defined in row four, and utf-8 encoding (see row 1), since we have the german umlauts “ä”, “ö” and “ü” in our file.

Bildschirmfoto 2016-01-22 um 12.31.12

Let’s load the data into R. We’ll do this with a function from the rgdal-package.

If you get an Error that says “Cannot open data source”, chances are there’s something wrong with your file name. Check that your working directory is properly set and that the file name is correct. Some browsers will change the .kml fily type to .txt, or even just add the .txt ending so you get “filename.kml.txt”. You’ll usually find the “layer” argument in your text file, named something like “name” or “id”, as shown above.

Did it work? Try to plot the polygons with the generic plot() function:

You should now see the black outlines of your polygons. Neat, isn’t it?

Next, we’ll need a little data to add to our map. To show you how to build a multilayer map, we will use two different csv files:   student1 & student2

The data contains information on the percentage of 18 to 25 year olds living in Dortmund in 2000 and 2014. Download the files and save them to your journocode directory. Make sure they’re still named student1 and student2.

This can be tricky sometimes: For our data, the encoding is “latin1” and the separation marks are commas. Open the csv files with a text editor to check if your separator is a comma, a semicolon or even a slash.

If everything worked out for you, celebrate a little! You’re a big step closer to your multilayer map!

 

Now for the interactive part

After looking through your data and analyzing it, you will now have some important information on how many values you have, which are the smallest and the biggest. For our example, we did that for you:

The highest value is 26%, so we can now think of a color scale from 0 to 26 to fill in our map. There are different statistical ways to decide what classes we want to divide our data into. For this mapping exercise, we will simply take eight classes: 0-5, 5-8, 8-10, 10-12, 12-14, 14-18, 18-24 and 24-26.

For every class, we want our map to fill the polygons in a different color. We’ll use a color vector generated with ColorBrewer here. Just copy the colour code you want, put it in a vector and replace it in the code. To paste the colors to the classes, use the function colorBin(). This is were you’ll need the package leaflet, which we will use to build our map. Install it, if you haven’t already.

Next up is the little infowindow we want to pop up when we click on the map. As you can see, I used some html code to specify some parameters for the first popup. For the second popup, I used a simpler way.

paste0() does the same thing as paste() but with no default separator. Check ?paste0 for more info. If something doesn’t work, check the punctuation!

 

Now for the map

After that, we’ll start right away with puzzling together all the parts we need:

The %>% operator is special to the leaflet package. Similar to the “+” in ggplot, it’s used to link two functions together. So remember: If you have a “%>%” opearator at the end of the line, R will expect more input from you.

The call to the function leaflet() starts the mapping procedd. The Provider Tile is your map base and background. If you don’t want to use the grey Tile in the example, have a look at this page and choose your own. Don’t worry if no map appears yet. With leaflet, you won’t see the actual map right away. First we’ll add the polygon layers and the popups we’ve defined to our map:

In our map, we want to be able to switch layers by clicking on a layer control panel with the group names. We’ll code that now:

Next, we want to add a thin color legend that shows the minimum and maximum value and the palette colors

The big moment: did it work? No mistake with the brackets or the punctuation? You’ll find out by typing:

Congratulations! You made your first multilayer choropleth with R! Now have fun building multilayer maps of your own city/country or even the whole world! If you want to publish your map, make sure you have the “htmlwidgets” package installed and add the following code to your script:

This will create a directory named “mymap_files” and a “mymap.html”-file. Save these two files in the same directory and load that on to your server. Et voilà: Your map is online!

If you publish a map based on our tutorial, feel free to link to our webpage and tell your fellows! We’d be delighted!

 

{Credits for the awesome featured image go to Phil Ninh}

Comments ( 23 )

  1. ReplyНовая-шина.рф
    I've been experimenting with choropleth mapping techniques in R, having reaped the benefits of ggplot2 for creating beautiful graphs within a powerful data analysis package.
  2. ReplySF99
    Finally a clear, step X step tutorial on Choropleth Maps. It works on my Rstudio. Keep similar tutorial articles coming... Thank you / Danke, Marie-Louise! SF99 - Pale Moon 26.1.0 and FF 44 - Ubuntu Linux 12.04 (32-bit)
  3. Replybharath
    Thank you so much. This blog was super helpful to understand and develop multi-layer map. I do have a question - how do I identify the correct layer in a KML file. I am trying to visualize the 2016 primary results, and when I get the kml file for counties, I am not sure how to read the correct layer. Reading the first layer in the file like how you did here is giving me macro level information.
    • ReplyMarie-Louise Timcke
      I am happy to hear that this tutorial helped you! To your problem: This depends on your kml-file. I guess it contains more than one layer? Have you tried to take the other layernames then? If you can't determine the name of the right layer you could send the file to me and I'll look through it. In my case it was really easy to find out the right name when I opened the file with the text editor but in your case it may be hiding.
      • Replybharath
        Thank you, Marie. I appreciate your help here, I am new to R programming and spatial analysis, and I am finding it difficult to comprehend things that might be quite basic. I am reading up on how to understand spatial data, I will reach out to you if I still do not make any progress. Thanks!
  4. ReplyLuis
    "kml-files are shape files containing geodata". Nope. For the sake of precision KML stands for Keyhole Markup Language It is a XML data format used to display geospatial information in a web browser, and is was designed by Google to power Google Earth. Shapefiles are a data format devised by ESRI and it has been a de facto standard for geospatial info dispaly and analy sis. That said, congrats for this very useful package which adds to the huge library of geospatial tools offered by R deve lopers. And, best of all with all the power of ggplot2 to produce stunning maps !
    • ReplyMarie-Louise Timcke
      Thank you very much for this usefull information on kml-files! I'll correct that!
  5. ReplySaurabh
    Marie, thank you for the article. It could be a good way to show the variation of a variable over a time period (for e.g. rainfall received over a period of 10 years) without creating a Shiny app. However, one would have to add 10 polygon layers resulting in 10 radio buttons (each for an year) which is a little cumbersome. Do you think there is a way around that?
    • ReplyMarie-Louise Timcke
      Great idea! I guess a select menue like in our shiny app would be a good solution to this problem, though I have to check how to implement this myself. But it should be possible. Maybe have a look at http://stackoverflow.com/questions/32181421/r-leaflet-is-it-possible-to-have-a-two-column-layer-control-panel. I'll let you know when I find something!
  6. ReplyJohannes
    Just for the case that someone uses Ubuntu and is not able to install the R-package "rgdal", I had the same problem but solved it with installing the missing packages via: sudo apt-get install apt-file apt-file update sudo apt-get install libgdal-dev libproj-dev apt-file update Best wishes, Johannes
  7. ReplyPhil
    Hi Marie, worked perfectly - thanks a lot! You may have a suggestion for using a timeslider instead of buttons to switch years? I got data according to a period of time of 7 years, so a slider would be perfect.
    • ReplyMarie-Louise Timcke
      Sorry, I must have overlooked your question. I think I have a suggestion for you somewhere, just need to remind it. Or maybe you did find a solution by yourself already?
  8. Coding marker maps with Leaflet.js | Journocode
    […] an earlier post, “Your first choropleth map“, we used Leaflet as well, but coded the map using the Leaflet R package, which works like a […]
  9. Journocode-Beitrag: Marker Map mit leaflet.js – Datentäter
    […] an earlier post, „Your first choropleth map„, we used Leaflet as well, but coded the map using the Leaflet R package, which works like a […]
  10. R: Plotting Germany Map with ggplot2 / tillt.net
    […] Your first choropleth map with R (leaflet, interactive) […]
  11. ReplyBart
    hi Marie, i have used your example to create a map with the municipalities in the Netherlands and a value per municipality. But the fill-colors per polygon are completely wrong. When i use addMarkers() my data is shown in the correct municipalities. My question is how are the values in the data-file (from 'students' in your example) plotted into the corresponding polygons from the shapefile ('dortmund' in your example)? Is it based on the same order of 'rows' in both dortmund shape- and student data files? If so, how can I deal with missing rows in the data files (there are more municipalities in my situation than i have data). Is it possible to somehow define a hard relation between the data in the 'student' files and the polygons in the shape file (eg based on the Bezirk column)?
    • ReplyMarie-Louise Timcke
      Hey Bart, I am sorry there's a problem with your colors. In my example code (and it's not the perfect way, I was still a beginner at that time), the colors are paired with the values based on the colorvector "palette": palette <- colorBin(c('#fee0d2', #an example color scheme. you can substitute your own colors '#fcbba1', '#fc9272', '#fb6a4a', '#ef3b2c', '#cb181d', '#a50f15', '#67000d'), bins = c(0, 5, 8, 10, 12, 14, 18, 24, 26)) I pair 0 with #fee0d2, 5 with #fcbba1, 8 with #fc9272 and so on. You have several options when dealing with missing values: You can clear them out of your data.frame, e.g. with na.omit(df) or df <- df[!is.na(df)]. Or you could give "NA"-values an own color different from the other values in your map, for example grey. You're absolutely right, you can match your data with the shapefile via any kind of ID they have in common. For example, if both had a column "ID": new <- merge(shape, data, by="ID"). Or, if they have the same column but with different names new <- merge(shape, data, by.x="ID", by.y="Bezirk"). Does that answer your question, or is there another problem?
  12. ReplyLauren
    Hi Marie, this is a wonderful tutorial! I appreciate it! I am finishing a project, and I am having the hardest time color coding markers by the category of a variable and then adding a second legend (to the type you created) for those markers. Do you have any idea how I can achieve this?
    • ReplyMarie-Louise Timcke
      Hey Lauren, sorry for the late reply. Great that you liked the tutorial. Did you figure out your problem or do you still have trouble color coding? I think I know how to do it but it slightly depends on your code. Could you supply an example of your data and the R file? Or maybe create an issue at StackOverflow and send me the link?
  13. ReplyPaul
    Hi Marie, thanks for the very useful tutorial! I had a similar problem to Bart in that my fill colours and popup labels were being allocated completely randomly which I could only resolve by creating a common iso2c country code ID column between the shapefile and the dataset I'm using (global life expectancy data) and then merging into a single dataset. Could you explain at what point in your process the shapefile recognises the names of the districts in your dataset and then assigns the labels and values to the appropriate location on the map? It looks like the only reference to the student data in the map function is 'student1$Anteil' which is a vector of numbers so I'm struggling to work out how each dataset is communicating with one another and allocating the data to the correct polygon on the map. Thanks!
  14. Replyap
    Hi Marie, thanks for tutorial. I have problem in finding KML file on Iran provinces. I tried to make on in google map but when I open the file in R there is no Layer name to be loaded. can you help me please?thanks

Leave a reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>