Your first interactive choropleth map with R

Your first interactive choropleth map with R

When it comes to data journalism, visualizing your data isn’t what it’s all about. Getting and cleaning your data, analyzing and verifying your findings is way more important.

Still, an interactive eye-catcher holding interesting information will definitely not hurt your data story. Plus, you can use graphics for a visual analysis, too.

Here, we’ll show you how to build a choropleth map, where your data is visualized as colored polygon areas like countries and states.
We will code a multilayer map on Dortmunds students as an example. You’ll be able to switch between layered data from different years. The popups hold additional information on Dortmunds districts.

Now for the data

First of all you need to read a kml-file into R. KML stands for Keyhole Markup Language and as I just learned from the comment section of this tutorial it is a XML data format used to display geospatial information in a web browser. With a bit of googling, you’ll find kml-files holding geospatial informations of your own city, state or country. For this example, we’ll use this data on Dortmunds districts. Right click the link and save the file. Download the kml-file and save it to a new directory named “journocode” (or anything you want, really, but we’ll work with this for now).

Start RStudio. If you haven’t installed it yet, have a look at our first R Tutorial post. After starting RStudio, open a new R script and save it to the right directory. For example, if your “journocode”-directory was placed on your desktop (and your Username was MarieLou), type

Remember to use a normal slash (/) in your file path instead of a backslash. Now, we can read the shape file directly into R. If you don’t use our example data, try open your kml-file with a text editor first to look for the layer name! As you can see on this screenshot, for “Statistische Bezirke.kml” we have a layer named “Statistische_Bezirke”, defined in row four, and utf-8 encoding (see row 1), since we have the german umlauts “ä”, “ö” and “ü” in our file.

Bildschirmfoto 2016-01-22 um 12.31.12

Let’s load the data into R. We’ll do this with a function from the rgdal-package.

If you get an Error that says “Cannot open data source”, chances are there’s something wrong with your file name. Check that your working directory is properly set and that the file name is correct. Some browsers will change the .kml fily type to .txt, or even just add the .txt ending so you get “filename.kml.txt”. You’ll usually find the “layer” argument in your text file, named something like “name” or “id”, as shown above.

Did it work? Try to plot the polygons with the generic plot() function:

You should now see the black outlines of your polygons. Neat, isn’t it?

Next, we’ll need a little data to add to our map. To show you how to build a multilayer map, we will use two different csv files:   student1 & student2

The data contains information on the percentage of 18 to 25 year olds living in Dortmund in 2000 and 2014. Download the files and save them to your journocode directory. Make sure they’re still named student1 and student2.

This can be tricky sometimes: For our data, the encoding is “latin1” and the separation marks are commas. Open the csv files with a text editor to check if your separator is a comma, a semicolon or even a slash.

If everything worked out for you, celebrate a little! You’re a big step closer to your multilayer map!

 

Now for the interactive part

After looking through your data and analyzing it, you will now have some important information on how many values you have, which are the smallest and the biggest. For our example, we did that for you:

The highest value is 26%, so we can now think of a color scale from 0 to 26 to fill in our map. There are different statistical ways to decide what classes we want to divide our data into. For this mapping exercise, we will simply take eight classes: 0-5, 5-8, 8-10, 10-12, 12-14, 14-18, 18-24 and 24-26.

For every class, we want our map to fill the polygons in a different color. We’ll use a color vector generated with ColorBrewer here. Just copy the colour code you want, put it in a vector and replace it in the code. To paste the colors to the classes, use the function colorBin(). This is were you’ll need the package leaflet, which we will use to build our map. Install it, if you haven’t already.

Next up is the little infowindow we want to pop up when we click on the map. As you can see, I used some html code to specify some parameters for the first popup. For the second popup, I used a simpler way.

paste0() does the same thing as paste() but with no default separator. Check ?paste0 for more info. If something doesn’t work, check the punctuation!

 

Now for the map

After that, we’ll start right away with puzzling together all the parts we need:

The %>% operator is special to the leaflet package. Similar to the “+” in ggplot, it’s used to link two functions together. So remember: If you have a “%>%” opearator at the end of the line, R will expect more input from you.

The call to the function leaflet() starts the mapping procedd. The Provider Tile is your map base and background. If you don’t want to use the grey Tile in the example, have a look at this page and choose your own. Don’t worry if no map appears yet. With leaflet, you won’t see the actual map right away. First we’ll add the polygon layers and the popups we’ve defined to our map:

In our map, we want to be able to switch layers by clicking on a layer control panel with the group names. We’ll code that now:

Next, we want to add a thin color legend that shows the minimum and maximum value and the palette colors

The big moment: did it work? No mistake with the brackets or the punctuation? You’ll find out by typing:

Congratulations! You made your first multilayer choropleth with R! Now have fun building multilayer maps of your own city/country or even the whole world! If you want to publish your map, make sure you have the “htmlwidgets” package installed and add the following code to your script:

This will create a directory named “mymap_files” and a “mymap.html”-file. Save these two files in the same directory and load that on to your server. Et voilà: Your map is online!

If you publish a map based on our tutorial, feel free to link to our webpage and tell your fellows! We’d be delighted!

 

{Credits for the awesome featured image go to Phil Ninh}

Comments ( 29 )

  1. ReplyНовая-шина.рф
    I've been experimenting with choropleth mapping techniques in R, having reaped the benefits of ggplot2 for creating beautiful graphs within a powerful data analysis package.
  2. ReplySF99
    Finally a clear, step X step tutorial on Choropleth Maps. It works on my Rstudio. Keep similar tutorial articles coming... Thank you / Danke, Marie-Louise! SF99 - Pale Moon 26.1.0 and FF 44 - Ubuntu Linux 12.04 (32-bit)
  3. Replybharath
    Thank you so much. This blog was super helpful to understand and develop multi-layer map. I do have a question - how do I identify the correct layer in a KML file. I am trying to visualize the 2016 primary results, and when I get the kml file for counties, I am not sure how to read the correct layer. Reading the first layer in the file like how you did here is giving me macro level information.
    • ReplyMarie-Louise Timcke
      I am happy to hear that this tutorial helped you! To your problem: This depends on your kml-file. I guess it contains more than one layer? Have you tried to take the other layernames then? If you can't determine the name of the right layer you could send the file to me and I'll look through it. In my case it was really easy to find out the right name when I opened the file with the text editor but in your case it may be hiding.
      • Replybharath
        Thank you, Marie. I appreciate your help here, I am new to R programming and spatial analysis, and I am finding it difficult to comprehend things that might be quite basic. I am reading up on how to understand spatial data, I will reach out to you if I still do not make any progress. Thanks!
  4. ReplyLuis
    "kml-files are shape files containing geodata". Nope. For the sake of precision KML stands for Keyhole Markup Language It is a XML data format used to display geospatial information in a web browser, and is was designed by Google to power Google Earth. Shapefiles are a data format devised by ESRI and it has been a de facto standard for geospatial info dispaly and analy sis. That said, congrats for this very useful package which adds to the huge library of geospatial tools offered by R deve lopers. And, best of all with all the power of ggplot2 to produce stunning maps !
    • ReplyMarie-Louise Timcke
      Thank you very much for this usefull information on kml-files! I'll correct that!
  5. ReplySaurabh
    Marie, thank you for the article. It could be a good way to show the variation of a variable over a time period (for e.g. rainfall received over a period of 10 years) without creating a Shiny app. However, one would have to add 10 polygon layers resulting in 10 radio buttons (each for an year) which is a little cumbersome. Do you think there is a way around that?
    • ReplyMarie-Louise Timcke
      Great idea! I guess a select menue like in our shiny app would be a good solution to this problem, though I have to check how to implement this myself. But it should be possible. Maybe have a look at http://stackoverflow.com/questions/32181421/r-leaflet-is-it-possible-to-have-a-two-column-layer-control-panel. I'll let you know when I find something!
  6. ReplyJohannes
    Just for the case that someone uses Ubuntu and is not able to install the R-package "rgdal", I had the same problem but solved it with installing the missing packages via: sudo apt-get install apt-file apt-file update sudo apt-get install libgdal-dev libproj-dev apt-file update Best wishes, Johannes
  7. ReplyPhil
    Hi Marie, worked perfectly - thanks a lot! You may have a suggestion for using a timeslider instead of buttons to switch years? I got data according to a period of time of 7 years, so a slider would be perfect.
    • ReplyMarie-Louise Timcke
      Sorry, I must have overlooked your question. I think I have a suggestion for you somewhere, just need to remind it. Or maybe you did find a solution by yourself already?
  8. Coding marker maps with Leaflet.js | Journocode
    […] an earlier post, “Your first choropleth map“, we used Leaflet as well, but coded the map using the Leaflet R package, which works like a […]
  9. Journocode-Beitrag: Marker Map mit leaflet.js – Datentäter
    […] an earlier post, „Your first choropleth map„, we used Leaflet as well, but coded the map using the Leaflet R package, which works like a […]
  10. R: Plotting Germany Map with ggplot2 / tillt.net
    […] Your first choropleth map with R (leaflet, interactive) […]
  11. ReplyBart
    hi Marie, i have used your example to create a map with the municipalities in the Netherlands and a value per municipality. But the fill-colors per polygon are completely wrong. When i use addMarkers() my data is shown in the correct municipalities. My question is how are the values in the data-file (from 'students' in your example) plotted into the corresponding polygons from the shapefile ('dortmund' in your example)? Is it based on the same order of 'rows' in both dortmund shape- and student data files? If so, how can I deal with missing rows in the data files (there are more municipalities in my situation than i have data). Is it possible to somehow define a hard relation between the data in the 'student' files and the polygons in the shape file (eg based on the Bezirk column)?
    • ReplyMarie-Louise Timcke
      Hey Bart, I am sorry there's a problem with your colors. In my example code (and it's not the perfect way, I was still a beginner at that time), the colors are paired with the values based on the colorvector "palette": palette <- colorBin(c('#fee0d2', #an example color scheme. you can substitute your own colors '#fcbba1', '#fc9272', '#fb6a4a', '#ef3b2c', '#cb181d', '#a50f15', '#67000d'), bins = c(0, 5, 8, 10, 12, 14, 18, 24, 26)) I pair 0 with #fee0d2, 5 with #fcbba1, 8 with #fc9272 and so on. You have several options when dealing with missing values: You can clear them out of your data.frame, e.g. with na.omit(df) or df <- df[!is.na(df)]. Or you could give "NA"-values an own color different from the other values in your map, for example grey. You're absolutely right, you can match your data with the shapefile via any kind of ID they have in common. For example, if both had a column "ID": new <- merge(shape, data, by="ID"). Or, if they have the same column but with different names new <- merge(shape, data, by.x="ID", by.y="Bezirk"). Does that answer your question, or is there another problem?
  12. ReplyLauren
    Hi Marie, this is a wonderful tutorial! I appreciate it! I am finishing a project, and I am having the hardest time color coding markers by the category of a variable and then adding a second legend (to the type you created) for those markers. Do you have any idea how I can achieve this?
    • ReplyMarie-Louise Timcke
      Hey Lauren, sorry for the late reply. Great that you liked the tutorial. Did you figure out your problem or do you still have trouble color coding? I think I know how to do it but it slightly depends on your code. Could you supply an example of your data and the R file? Or maybe create an issue at StackOverflow and send me the link?
  13. ReplyPaul
    Hi Marie, thanks for the very useful tutorial! I had a similar problem to Bart in that my fill colours and popup labels were being allocated completely randomly which I could only resolve by creating a common iso2c country code ID column between the shapefile and the dataset I'm using (global life expectancy data) and then merging into a single dataset. Could you explain at what point in your process the shapefile recognises the names of the districts in your dataset and then assigns the labels and values to the appropriate location on the map? It looks like the only reference to the student data in the map function is 'student1$Anteil' which is a vector of numbers so I'm struggling to work out how each dataset is communicating with one another and allocating the data to the correct polygon on the map. Thanks!
    • ReplyMarie-Louise Timcke
      Hey Paul! Sorry, I somehow overlooked your question! The simple answer is: At no point! They just both have exactly the same order and the code just matches the first polygon of the geodata with the first row of the dataset. Of course it's possible that your data has a different order than the geodata. Then you have to do exactly what you did: Somehow rearrange the dataset to have the same order the geodata has or merge both data sources before plotting the map.
  14. Replyap
    Hi Marie, thanks for tutorial. I have problem in finding KML file on Iran provinces. I tried to make on in google map but when I open the file in R there is no Layer name to be loaded. can you help me please?thanks
    • ReplyMarie-Louise Timcke
      One geodata source that has iranian borders and provinces is https://wambachers-osm.website/boundaries/. Hope this helps!
  15. ReplyHenry Ananiev
    Hi Marie, thanks for awesome information. It's give me a strange step in visualisation journey! However, my problem is how to compare the area of the data area of the card ? Below a test example where I extract the site names from the downloaded maps in R, bind random numbers and see what the pop-up signatures do not match the real territory on the map. I do not understand how correctly to compare? m <- rgdal::readOGR(dsn = "C:\\Users\\AnanevHA\\Downloads\\regions2010_wgs\\regions2010_wgs.KML", verbose = T,use_iconv = T,encoding = "utf-8",drop_unsupported_fields = T) reg <- stringr::str_match(string = m$Description, pattern = "(region)(.+)()") reg <- unique(reg[,3]) layer.1 <- data.frame(reg,rar=floor(rnorm(n = length(reg),mean = 1000,sd = 400)),stringsAsFactors = F) layer.1$kb <- round(layer.1$rar*runif(n = length(reg)),0) layer.1$share <- round(layer.1$kb/layer.1$rar,4)*100 head(layer.1,11) layer.2 <- data.frame(reg,rar=layer.1$rar,stringsAsFactors = F) layer.2$kb <- round(layer.2$rar*runif(n = length(reg)),0) layer.2$share <- round(layer.2$kb/layer.2$rar,4)*100 head(layer.2,11) library(leaflet) palette <- colorBin(c('#a50026','#d73027','#f46d43','#fdae61','#fee08b','#ffffbf','#d9ef8b','#a6d96a','#66bd63','#1a9850','#006837'), bins = c(0, 10, 20, 30, 40, 50, 60,70, 80,90, 100)) info.1 <- paste0("Детализация охвата КБ", "Территория: ", layer.1$reg, "Лицензиатов: ", layer.1$rar ,"КБ: ", layer.1$kb ,"Доля,%: ", layer.1$share ) info.2 <- paste0("Детализация охвата КБ", "Территория: ", layer.2$reg, "Лицензиатов: ", layer.2$rar ,"КБ: ", layer.2$kb ,"Доля,%: ", layer.2$share ) mymap % addProviderTiles("Esri.WorldStreetMap" # options = tileOptions(minZoom=10, maxZoom=16) ) %>% #"freeze" the mapwindow to max and min zoomlevel addPolygons(data = m, fillColor = ~ palette(layer.1$share), ............................................................................................................ # next the same code that you have
    • ReplyMarie-Louise Timcke
      Hey Henry, thank you very much for your positive feedback and your question! I just didn't get it exactly: Is your problem the same jolie and Paul had? Maybe if you could upload the .KML file you use somewhere (e.g. GitHub), I could run your code by myself and see what the problem is and how we can fix it!
  16. Replyjolie
    Hi Marie, I have the same question as Paul: How can I link a KML file to an external data set? How does it know which fields to match? Thanks a lot for your guidance!
    • ReplyMarie-Louise Timcke
      Hey Jolie, thanks for your question. The problem you both have is that your datasets are in a different order than your geodata. Geodata always has a plotting order: The first geopoint is plotted first, the second is plotted second etc.. A grouping ID determines which geopoints will be grouped as one polygon. In this tutorial, I don't merge or match the dataset to the geodata, because they‘re both in the same order already. The first polygon of the .KML file is the district named City. And the City district is the first district of the students data. If you have a dataset that‘s not in the same order as the geodata, you either have to merge the data to the geodata by a column they both have in common or rearrange the dataset to match the order of the polygons in the geofile. Whatever option you want to take, never rearrange the order of the geodata but only the order of the additional datasets! My examplary data doesn‘t have any column both have in common. If they weren't arranged in the same order, I'd have to find additional information on which ID refers to which district. You can often find this information in the same place you got the geodata from. But most geodatasets have a column with some sort of statistical ID or the districts‘ names, which you can use to reorder your additional datasets. I hope this helps you!

Leave a reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>