Project: Visualizing WhatsApp chat logs –
Part 2: Visualization

by Moritz Zajonz 1 Comment
Project: Visualizing WhatsApp chat logs – <br>Part 2: Visualization

Part 1 | Code

If you followed part one of this project, you should now have a clean data set that you can work with. Now, we’ll create some pretty visualizations with it. This is how it will look:

Rplot02

 

In this post, we’ll go through the visualizations line by line. On our GitHub page, you can find the code for cleaning and visualization in its entirety.

Requirements

We could, theoretically, use the built-in R graphics package. But there’s other ways to make much more aesthetically pleasing graphics in R. We’ll use a package named ggplot2 here. One way to install it is to use  require("ggplot2") . For more info on installing packages, check out our tutorial.

If you don’t have the chat dataset in your workspace anymore, the next step is to load in the “.Rdata” file you prepared. Like this:

load("C:/Users/yourname/folder/anotherfolder/lastfolderipromise/whatsapp_cleaned.Rdata")

Hint: Notice the slashes? They’re slashes, not backslashes. If you copy file paths from the windows explorer, you get backslashes. You don’t want those. They ruin everything, because they’re reserved for special purposes in R. You want slashes. So change them.

This variable should now contain the cleaned chat log, ready for plotting. With the  View() function you can check if your data looks about right.

Visualization

The ggplot2 package is very powerful and provides an entirely different graphics system for R. Its grammar differs quit a bit from the basic graphics package and will probably take some getting used to. If you want to get to know ggplot a little better, these YouTube tutorials by Roger Peng are a good start.

Now to the next steps. With ggplot2, you can define various stuff for your plot. We will show you two examples to get you started.

For example, it might be interesting to take a look at who writes the most before 6 am in a group chat. Remember when we converted the first column to date format and called it time? Well, one of the nice things about date formats is we can easily extract only the hours, weekdays or minutes from our timestamp. You can do this with the “$” operator we use to target elements of a list or columns of a data frame. Like this:

Note: This example is taken from this page, which contains a comprehensive explanation of dates and times in R.

So if we want to extract only the rows with messages written before 6 am from our chat data frame, we can do that with   chat[chat$time$hour < 6,] . In a ggplot function, it might be used like this:

A little confused? No worries, it’ll make more sense in a second. The  aes() part changes your plot’s aesthetics; here, it tells the function to put the hours of the column “time” on the “x-axis and   fill  the bars with the number of times each name appears in the chat log (remember, the column name contains the first names of the group members).

If you run just the first line, nothing will happen. We’ve created a new ggplot(), but haven’t told it what kind of graphic to actually plot. So in the next lines, we’ll add visual elements to the plot to define how it should look. These statements are linked with the “+” operator. In ggplot, it’s used to tell R that two functions refer to the same plot.

The function stat_count()  plots the bars. For the   position  argument, you can also use the option “stack” instead of “dodge” if you want the bars for the different group members stacked on top of each other instead of side by side. You don’t necessarily need the rest of the lines to create a plot, but they let you add titles and labels to your plot and set the font case of the title (to “bold”, in this case). If you run the code, something like this should appear in your plot window in RStudio:

this isn't the tooltip, is it?

Neat, right? Here’s another example. This time, we’ll use the entire chat data frame in the ggplot() function, since we want to take a look at the overall message counts by hour, and we’ll change the title to italics.

I will look something like this:

Journocode conversations per hour

 

 

 

 

 

 

 

 

 

 

 

 

Now, feel free to play a little with this on your own! Try changing the values for the x-axis aes(x = ...)  to chat$time$months to take a look at the monthly message counts or simply change the titles and labels. Or try something entirely different: You could analyze not just the message counts, but the chat content as well. There are R packages for creating word clouds that might offer great visualization options. You’re welcome to share your ideas with us. Feel free to contact me or any of the other journocode members via e-mail, twitter or our slack team.

 

Part 1 | Code

 

{Credits for the awesome featured image go to Phil Ninh}

Comment ( 1 )

Leave a reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>