Scraping for everyone

Scraping for everyone

by Sophie Rotgeri, Moritz Zajonz and Elena Erdmann

One of the most important skills for data journalists is scraping. It allows us to download any data that is openly available online as part of a website, even when it’s not supposed to be downloaded: may it be information about the members of parliament or – as in our christmas-themed example – a list of christmas markets in Germany.

There are some tools that don’t require any programming and which are sufficient for many standard scraping tasks. However, programmed scrapers can be fitted precisely to the task at hand. Therefore, some more complicated scraping might require programming.

Here, we explain three ways of scraping: the no-programming-needed online tool import.io and writing a scraper both in R and Python – since those are the two most common programming languages in data journalism. However, if you read the different tutorials, you will see that the steps to programming a scraper are not that different after all. Make your pick:

No programming needed Python R


Scraping with Import.io

There are two steps to scraping a website. With import.io, you first have to select the information important to you. Then the tool will extract the data for you so you can download it.

You start by telling import.io which URL it should look at, in this case the address “http://weihnachtsmarkt-deutschland.de”. When you start a new “Extractor”, as import.io calls it, you will see a graphical interface. It has two tabs: “Edit” will display the website. Import.io analyses the website’s structure and automatically tries to find and highlight structured information for extraction. If it didn’t select the correct information , you can easily change the selection by clicking on the website’s elements you are interested in.

In the other tab, “Data”, the tool shows the selected data as a spreadsheet, which is how the downloaded data will look like. If you are satisfied, click “Extract data from website”.

Now, the actual scraping begins. This might take a few seconds or minutes, depending on the amount of data you told it to scrape. And there we go, all the christmas markets are already listed in a file type of your choosing.


Scraping with Python

There are two steps to scraping a website. First, you download the content from the web. Then, you have to extract the information that is important to you. Good news: Downloading a website is easy as pie. The main difficulty in scraping is the second step, getting from the raw HTML-file to the content that you really want. But no need to worry, we’ll guide you with our step-by-step tutorial.

Let’s get to the easy part first: downloading the html-page from the Internet. In Python, the requests library does that job for you. Install it via pip and import it into your program. To download the webpage, you only have to call requests.get and hand over the url, in this case the address “http://weihnachtsmarkt-deutschland.de”. Get() will return an object and you can display the downloaded website by calling response.text.

Now, the entire website source, with all the christmas markets, is saved on your computer. You basically just need to clean the file, remove all the HTML-markers, the text that you’re not interested in and find the christmas markets. A computer scientist would call this process parsing the file, so you’re about to create a parser.

Before we start on the parser, it is helpful to inspect the website you’re interested in. A basic understanding of front-end technologies comes very handy, but we’ll guide you through it even if you have never seen an HTML-file before. If you have some spare time, check out our tutorial on HTML, CSS and Javascript.

Every browser has the option to inspect the source code of a website. Depending on your Browser, you might have to enable Developer options or you can inspect the code straight away by right-clicking on the website. Below, you can see a screenshot of the developer tools in Chrome. On the left, you find the website – in this example the christmas markets. On the right, you see the HTML source code. The best part is the highlighting: If you click on a line of the source code, the corresponding unit of the website is highlighted. Click yourself through the website, until you find the piece of code that encodes exactly the information that you want to extract: in this case the list of christmas markets.

All the christmas markets are listed in one table. In html, a table starts with an opening <table> tag, and it ends with </table>. The rows of the table are called <tr> (table row) in html. For each christmas market, there is exactly one row in the table.

Now that you have made yourself familiar with the structure of the website and know what you are looking for, you can turn to your Python-Script. Bs4 is a python library, that can be used to parse HTML-files and extract the HTML-elements from it. If you haven’t used it before, install it using pip. From the package, import BeautifulSoup (because the name is so long, we renamed it to bs in the import). Remember, the code of the website is in response.text. Load it by calling BeautifulSoup. As bs4 can be used for different file types, you need to specify the parser. For your HTML-needs, the ‘lxml’ parser is best suited.

Now the entire website is loaded into BeautifulSoup. The cool thing is, BeautifulSoup understands the HTML-format. Thus, you can easily search for all kinds of HTML-elements in the code. The BeautifulSoup method ‘find_all()’ takes the HTML-tag and returns all instances. You can further specify the id or class of an element – but this is not necessary for the christmas markets. As you have seen before, each christmas market is listed as one row in the table, and marked by a <tr> tag. Thus, all you have to do is find all elements with the </tr> tag in the website.

And there we go, all the christmas markets are already listed in your Python output!

However, the data is still quite dirty. If you have a second look at it, you find that each row consist of two data cells, marked by <td>. The first one is the name of the christmas market, and links to a separate website with more information. The second data cell is the location of the christmas market. You can now choose, which part of the data you are interested in. BeautifulSoup lets you extract all the text immediately, by calling .text for the item at hand. For each row, this will give you the name of the christmas market.

Done! Now you have the list of christmas markets ready to work with.


Scraping with R

There are two steps to scraping a website. First, you download the content from the web. Then, you have to extract the information that is important to you. Good news: Downloading a website is easy as pie. The main difficulty in scraping is the second step, getting from the raw HTML-file to the content that you really want. But no need to worry, we’ll guide you with our step-by-step tutorial.

Let’s get to the easy part first: Downloading the HTML-page from the Internet. In R, the “rvest” package packs all the required functions. Install() and library() it (or use “needs()”) and call “read_html()” to.. well, read the HTML from a specified URL, in this case the address “http://weihnachtsmarkt-deutschland.de”. Read_html() will return an XML document. To display the structure, call html_structure().

Now that the XML document is downloaded, we can “walk” down the nodes of its tree structure until we can single out the table. it is helpful to inspect the website you’re interested in. A basic understanding of front-end technologies comes very handy, but we’ll guide you through it even if you have never seen an HTML-file before. If you have some spare time, check out our tutorial on HTML, CSS and Javascript.

Every browser has the option to inspect the source code of a website. Depending on your Browser, you might have to enable Developer options or you can inspect the code straight away by right-clicking on the website. Below, you can see a screenshot of the developer tools in Chrome. On the left, you find the website – in this example the christmas markets. On the right, you see the HTML source code. The best part is the highlighting: If you click on a line of the source code, the corresponding unit of the website is highlighted. Click yourself through the website, until you find the piece of code that encodes exactly the information that you want to extract: in this case the list of christmas markets.

All the christmas markets are listed in one table. In html, a table starts with an opening <table> tag, and it ends with </table>. The rows of the table are called <tr> (table row) in html. For each christmas market, there is exactly one row in the table.

Now that you have made yourself familiar with the structure of the website and know what you are looking for, you can go back to R.

First, we will create a nodeset from the document “doc” with “html_children()”. This finds all children of the specified document or node. In this case we call it on the main XML document, so it will find the two main children: “<head>” and “<body>”. From inspecting the source code of the christmas market website, we know that the <table> is a child of <body>. We can specify that we only want <body> and all of its children with an index.

Now, we have to further narrow it down, we really only want the table and nothing else. To achieve that, we can navigate to the corresponding <table> tag with “html_node()”.This will return a nodeset of one node, the “<table>”. Now, if we just use the handy “html_table()” function –- sounds like it was made just for us! –- we can extract all the information that is inside this HTML table directly into a dataframe.

Done! Now you have a dataframe ready to work with.

Extracting geodata from OpenStreetMap with Osmfilter

Extracting geodata from OpenStreetMap with Osmfilter

A guest post by Hans Hack

When working on map related projects, I often need specific geographical data from OpenStreetMap (OSM) from a certain area. For a recent project of mine, I needed all the roads in Germany in a useful format so I can work with them in a GIS program. So how do I do I get the data to work with? With a useful little program called Osmfilter.

There are various sites that provide OSM datasets for certain areas. However, these datasets include ALL the geodata OSM provides. That means houses, streets, rivers etc – basically everything you see on a normal map. In addition, the file is in a rather inconvenient format for direct usage. A quite complicated way to proceed with the dataset would be to load it into a database and query what you need. This can be time consuming and – depending on the power of your PC – impossible.

If you only need to get info about small areas, I recommend using the site overpass turbo. For larger datasets, there is Osmfilter, which easily lets you filter OSM geodata. With a little help from two other free tools, you will have a dataset you can work with in no time.

 

What you’ll need for this tutorial

 

Get an OSM Dataset

To start out, we need to download an OSM dataset, which is saved in a format called pbf (a format to compress large data sets). For this tutorial, I will use a dataset provided by geofabrik, but there are other sources, too. Lets download the pbf file for Liechtenstein and save it in a folder of your choice. Once you have downloaded the data, open the shell and go to the folder where your new dataset is stored with the command

Prepare your data

Osmfilter only supports the file formats osm and o5m. For fast data processing, using o5m is recommended. You can convert your pbf file to o5m with osmconvert in your shell like this:

This translates to: Use the program osmconvert and convert the file called liechtenstein-latest.osm.pbf to a o5m file called liechtenstein. The -o stands for output.

You will now have the same dataset in the o5m format in the same folder.

Filter your data

Now, you can filter your geodata using the shell that should still be open. The osmfilter command logic is built up as follows:

Let’s look at the part about the filter commands. Here is where you can tell the program which parts of the dataset you need by writing --keep=DATA_I_WANT . Here is an example which creates a file for you called buildings.osm that contains all the buildings (and only the buildings) from the Liechtenstein dataset:

To find out how features are stored and classified in OSM, you can go to this site and look up the feature you want. Tip: You can head over to overpass turbo to test your query on a small area of your choice by using the ‘wizard’.

Of course you can do much more with Osmfilter. You can specify which building type you want. For example, you might only want to look at schools:

You can query multiple features by chaining them. For example if you want all the schools and universities:

You can exclude things by adding the flag –drop. For example, if you don’t want to have buildings that are warehouses but keep everything else:

You can reduce the final file size by dropping extra data on the authors and the version by adding:

You can, of course, combine these flags. Here is a query that gives you all the highway types that cars can use in Lichtenstein without the ones where motor vehicles are not allowed:

You find more on the filtering options on the Osmfilter site with some examples too.

Convert to a useful format

As a final step, you can convert your osm file to the most widely supported geodata format called a Shapefile (the GIS program QGIS can handle osm files, but it sometimes doesn’t work well with large datasets). You can convert your osm file to a Shapefile with the program ogr2ogr like this:

The above command converts the file streets_liechtenstein.osm to the shapefile format and tells it to store it in a folder called streets_shapefiles. In the newly created folder you will find 4 different shapefiles (one for every geometry type). In case of the streets, we are only interested in the file lines.shp. You can open this file in a GIS program of your choice.

Ogr2org also allows you to convert your newly created Shapefiles to other geodata formats that you may need, such as GeoJSON, CSV and many more. Have a look at the ogr2ogr website for more info. If you’re tired of using the shell, try the online tool Mapshaper which allows you to convert your Shapefile file to formats such as GeoJSON, SVG and CSV. The file size for Mapshaper is limited but I have tried it with files bigger than 1 GB.

 

Have fun filtering OSM and happy mapping!

 

JavaScript: Interactive Leaflet.js map with search bar

JavaScript: Interactive Leaflet.js map with search bar

In our previous posts about Leaflet.js, we coded an interactive marker map and learned how to update our data with a google spreadsheet. In this short tutorial, we will show you how to make your map searchable so users can find a specific marker.

We’ll use the code of the already finished map that’s hosted on our GitHub page so we don’t have to start at the very beginning again. Simply download the directory to your computer and open it in your text editor. If you’ve never created an interactive map with Leaflet.js before, please have a look at the two tutorials mentioned above before proceeding with this one.

HTML

Adding a search bar to the map isn’t hard at all, thanks to Italian programmer Stefano Cudini. He is the author of Leaflet.Control.Search, a JavaScript library that will do the trick for us. First, download the libraries’ JS and CSS code from his GitHub repository and save it in the map’s directory. It is currently named loop_google_spreadsheet, but you can rename it to whatever you want.

Now, open index.html in the editor and include the two files you just downloaded:

Done. Now we get to the part where we’ll change a little bit more of our map code.

JavaScript

Adding a visual search bar to the map is easy. First, we have to create a new layer which will be searchable later. Then we add the search control. This is how it’s done:

If you want a nice search icon to appear on the bar, download the images directory of Stefano Cudinis library and copy it into your map directory. Now, when you open the index file in a web browser, you should see the search bar on your map.

The search bar is useless as of now, since we haven‘t given the search bar any data to look through. So in the next step, we want to fill markersLayer with the search keywords and the markers’ location. We use the code of the Google Spreadsheet map, so we’ll load the data with Tabletop, like we did in the previous tutorial, and set the URL, size and orientation of our icons. But we’ll delete or comment out the entire part where we actually add the markers on the map:

Now, to add the markers with matching popups to the map and also to the search corpus, we have to add the markers to markersLayer. We define the value searched as the markers’ title and the found position as loc. Then we’ll save every new L.marker as marker, bind the popup and add the whole thing to our layer.

And that’s basically it! Now you can search for the newsrooms’ names and have the map zoom to their corresponding location. You can also add the snippet

to zoom back to the normal level once you close the search bar.

And this is what your result should look like. Click on the search symbol, type in “Dat” and see what happens.

As always, you’ll find the entire code for this example on our GitHub page. If you have any questions, suggestions or feedback, feel free to leave a comment! We’ll try to answer fast.

JavaScript: Loop through a Google Spreadsheet

JavaScript: Loop through a Google Spreadsheet

In our post Coding marker maps with Leaflet.js we filled a map visualization with information given directly in the code, but we also learned how to loop through data in a JSON file to create markers and pop-ups. This time, we will loop through an public Google Spreadsheet. We will do this with the same example map as in the previous post. But this little trick will not only help you with building marker maps but can come in handy for any JavaScript application you need to automate the handling of your data.

So let’s start right away with setting up a data table with Google Spreadsheets.

How to encrypt your emails with PGP keys

by Moritz Zajonz 1 Comment
How to encrypt your emails with PGP keys

Important Notice: Considering the recent disclosure of vulnerabilities in popular e-mail clients like Mozilla Thunderbird, we decided to delete this post. The current PGP implementation in email clients has vulnerabilities, that haven’t been fixed for now and will take time to get fixed. For more information about the technical side visit efail.de and for a detailed explanation, read the post by the Electronic Frontier Foundation. Thanks for your interest in this topic! We will update this post when new info is available.

 {Credit for the awesome featured image goes to Phil Ninh}

Adding tooltips to SVG graphics

Adding tooltips to SVG graphics

Java Script libraries and other tools offer cool ways to visualize data, but sometimes, you may want an even more customizable way of presenting a topic on the web. Maybe you already have the perfect graphic, but it’s not interactive yet. In this tutorial, we’ll show you a way to add tooltips to your SVG graphics.

One example for an interactive graphic a library can’t give you is the following site plan showing the Düsseldorfer Rhine Funfair with its attractions and food stands I built for the Rheinische Post. You can find additional information (in german) as tooltips when hovering over some of the attractions.

The basis for the map is a simple image I created with the design program Sketch. The trick I used to add tooltips to some of the map elements doesn’t require much coding and can be used to add tooltips to every website element including divs, text and pictures.

So let’s get right to it!
In this tutorial we will add tooltips to a very simple examplary SVG file you can download here. SVG stands for scaleable vector graphic. It’s the name of a file format where every image object is described not by the colour of its individual pixels, but by its basic shapes and their attributes. A big advantage SVG graphics have over other formats like PNG or JPEG ist that they’re scalable, just like their name says, so they never get blurry, no matter how much you zoom in. One way to create SVG files is to use a design program like Illustrator, Inkscape or Sketch.

Pro tip: RStudio lets you export charts in this format, too. Pimp up boring charts with some creative elements like I did with this stacked barchart on germanys reasons for trading trash:

The following screenshot shows the SVG image we are going to add tooltips to. It contains a big rectangle with a turquoise frame as the background, a red circle, a yellow star and a blue triangle.

Bildschirmfoto 2016-07-30 um 18.54.21

If you save the image as an SVG file and open it with a text editor, this is what you’ll get:

This is not the typical way you expect images to look like. And this is what makes SVG awesome: It translates your artwork into a snippet of XML code. XML files, or Extensible Markup Language files, look a lot like HTML. They’re not, though: XML was developed as a way to store data, while HTML is responsible for displaying data, mostly on the web. XML doesn’t have predefined tags like <a> or <body>, like HTML does. In XML, you’re entirely free to make up your own tags to describe different parts of your data.

SVG graphics still follow a very specific XML structure. That structure will work like a charm. Try putting the SVG code into your websites body. Set the SVG width to a 100%, delete the height settings and save the file as index.html. It should look like this:

Et voilà: The static image is ready for the web! Your browser can interpret the XML code of your SVG image easily and convert it back to a beautiful graphic.

 

Adding interactivness

Great! Now, we will use the jQuery Plugin Tooltipster to add information to this svg! The Plugin has nice presettings and a really good documentation. These three steps will lead you to your first interactive SVG:

Step 1: Download the Plugin

Bildschirmfoto 2016-07-30 um 19.42.35Create a new folder, name it svg_interactive (for example) and store the index.html file in it. You can find and download the Plugin’s scripts and style sheets on our GitHub repository or on the Plugins webpage. The main folder contains plenty of subfolders and documents you’ll barely need. For our example, we’ll only use tooltipster.bundle.min.jstooltipster.bundle.min.css and, for a little bit of extra style, the theme sheet tooltipster-sideTip-punk.min.css. You can either save the whole tooltipster folder in svg_interactive and then set the correct paths to those files in your html head, or you only store the three files you’ll actually use in your folder like we did as shown in the screenshot.

Step 2: Include the scripts

Set the paths to the style sheets and the JS script in the head of your webpage. Don’t forget to include jQuery, since Tooltipster depends on it:

Step 3: Add information to the image elements

Next, we’ll add a small snippet of Java Script code to the head manually. You could also save these few lines in an extra .js file and include them like the others, but for now, we’ll just keep it as an inline script.

This snippet defines a tooltip class with the theme tooltipster-punk and specifies that the content will be HTML. If you add

to an SVG element it will get a tooltipster-punk themed tooltip with HTML styled text. And that’s basically it!

For example, try:

Now, if you want to change the tooltip’s style you can save some CSS settings in a new stylesheet and include it in the webpages head. You can change the font family, the color of the popup background, the popup arrow or the default pink border-bottom, for example, like this:

Save the file as style.css and include it in the .html file. You can also change the trigger for opening and closing the tooltips in the manually added Java Script. You can set it to hover, click or optimize it for mobile devices with a customized trigger like this:

And this is the result: Click on it, hover over the symbols or try it on your smartphone!

So be creative with your data visualizations, even if complex Java Script libraries aren’t yours.

As always, you can get the entire code to this example on our Github Page. If you have any questions or feedback, feel free to leave a comment or mail us!

JavaScript: Coding marker maps with Leaflet.js

JavaScript: Coding marker maps with Leaflet.js

Showing locations on a map can be pretty cool to provide some context for your story or to give your reader an overview of where the story takes place. A good way to build a simple, yet responsive and professional looking map is to use the JavaScript library Leaflet.

In an earlier post, “Your first choropleth map“, we used Leaflet as well, but coded the map using the Leaflet R package, which works like a wrapper to translate the more common Leaflet functions into R syntax. It’s very useful if you’re more used to R syntax and don’t want to learn JavaScript anytime soon. But using the original JS library and coding the map with JavaScript will give us way more freedom when customizing the map, which is why we’ll try it that way today.

As an example, let’s start with a map of the locations of some data journalism newsrooms in the German speaking area. As always you can find all the code of this tutorial on our GitHub page.

This is what the finished map will look like:

HTML, CSS & a little JavaScript: The Basics (Part II)

HTML, CSS & a little JavaScript: The Basics (Part II)

Part I

This is part two of our tutorial on HTML, CSS and a little bit of JavaScript. In the last part, we learned about the basic functions of those three languages and have gotten to know a few useful HTML commands. If you’ve already read part one or you know all of that stuff anyway, this is the perfect spot for you to continue – by learning about CSS and how to implement JavaScript libraries into your webpage. Let’s get right to it!

CSS

CSS is short for Cascading Style Sheets. It’s a neat way to tell your browser how it should make your webpage look. You can style your HTML code in different ways.

For small adjustments, it might be enough to use inline style. With this method, you specify the style adjustments for specific elements right in the elements tag. For example, we talked about the <span>  element in part one. This is how you would use <span>  and some inline style to colour part of your sentence:

Here, you can use the command style like an attribute. You can specify multiple things in one inline style attribute, like specifying style="color:red;text-align:right"  to change text alignment and color in one go, but make sure to put quotes around the entire thing or it won’t work.

 Of course, there are a lot more possible settings than just the text color:

  • You’ve just learned what color  does. You can choose from the default color names or specify by hexadecimal code or rgb color code.
  • background-color  sets the background color of the current element. That could be some text or an entire paragraph, a section defined by a div or even the whole body of the document.
  • font-family  specifies the font for an element.
  • font-size  specifies the size of the current text element.
  • text-align  lets you choose to align the text left, right, centered or justified. initial chooses the default value and inherit aligns the text according to the alignment of its parent element.

There’s more, but let’s just work with that for now. Try using inline style to change the above properties in you HTML document that you’ve created in part one of this tutorial.

If you want to define a lot of properties or edit a lot of elements, inline style might be a little inconvenient. Thankfully, there is a different way: You could use the style tag. Anything you type between <style></style>  in your document will be interpreted as CSS commands. You’ll usually do that in the head, since it counts as meta information. But if you write some CSS code in the head of your document, how is your browser supposed to know which element you’re referring to?

That’s why there are things called ids and classes. For  every HTML element in your document, you can define one (or even multiple) classes and a unique ID. It will look like this.

We just gave the first div the classes red and pushright as well as the ID a1. The second one only has the class red, and the third one doesn’t have any class or ID at all. Classes are usually used for a larger amount of element that should have the same CSS properties. IDs are usually unique and are a useful way to single out one specific element you want to manipulate.

Let’s use CSS in our <style>  tags to turn the text color of anything with the red class to red and to align right any element with the class pushright. The element with the ID a1 should be in bold print. Also, all divs without any classes should have the text color purple and be aligned left. It would look something like this:

If you copy and paste the above text into the <head> of your document, you should see the changes we wanted to apply the next time you refresh the file in the browser.

There’s one mystery though. The first style rule is supposed to apply to all <div>  elements and change their text color to purple. But two of our divs are red anyway. Of course, that’s because they have the red class, and apparently, the style rules for classes are stronger than those for general elements if there’s a contradiction.

In reality, it’s a little more complicated than that. There’s whole tutorials centered around the so-called specificity rules of CSS that determine which style rules get applied and what place in the CSS hierarchy they occupy. If some rules don’t get applied even though you think they should, it’s probably a specificity error. As a basic rule, though, we can remember this hierarchy, in order from highest to lowest priority:

  1. Inline style
  2. IDs
  3. classes
  4. general elements

Try to play around with these CSS settings a little. In the meanwhile, we’ll take a quick look at how to incorporate JavaScript libraries into your webpage and how to manage the files that make up your site.

JavaScript libraries

JavaScript is quite a powerful language that lets you manipulate pretty much everything about your webpage. It might get complicated pretty quickly though if you want to create more complex elements. Thankfully, you don’t have to do all the work yourself. There are lots of JavaScript libraries available online, bundles of functions that serve a specific purpose. If you want to code interactive maps for a webpage, you might be interested in leaflet, for example. Highcharts or the very powerful D3 library are a wonderful tool for creating all kinds of data visualizations. To use the functions provided in those libraries, you’ll have to import them into your current document.

Most libraries even provide tutorials on how to do that. The only thing you actually need about that are the files where the functions of the library are defined. you’ll find the links to those on the webpages of the libraries. Once you know where the files are, you have to options for importing them. You can either use a hosted version that the creators of the library have uploaded to their servers. In that case, just import them by typing

for CSS stylesheets or

for Javascript files into the head of your HTML document. That’s all there is to it!

If you want to be on the safe side, you can also download the files from the links provided and save them on your own computer. If you do that, just replace the URL in the above commands with the path to the files on your computer. Not that scary, right.

File management: External files and file paths

While we’re at the topic of importing external scripts, let’s take a quick look at file management before we call it a day. Until now, we’ve coded our entire webpage in a single document. we’ve embedded CSS code with the <style>  tags and JavaScript with the <script>  tags. Once you build more complex pages, that might mean your document will get pretty lengthy and confusing. There’s an easy way out, though: Just save your CSS code as a .css file and your JavaScript bits as a .js file. It’s helpful to create a new folder to keep all the files in one place. The HTML document then serves as the hub that combines all the files into one page, which is why it’s customary to name your main HTML document index.html.

htmlfiles1

Then, you can refer to the files in the same way we explained above for libraries – after all, they’re nothing but scripts and style documents themselves.

If you put the links to the files in the same place we put the scripts and style rules before, there should be no difference between separate and joint files.

When specifying the file paths, you don’t have to use the complete paths. It’s possible to do so, but it’s not advisable. Because if you move your folder or decide to deploy your page to a server instead of keeping it on your computer where only you can see it, the paths change and your index.html won’t be able to locate the files it needs. Instead, you can use the relative file path, meaning the path relativ to the current location of your index.html. So if you keep all your files in one folder, it should just be enough to only specify the file name. That will make it look in the current directory. If your files are in a subfolder, just specify the path from index.html to the file with "subfolder1/subfolder2/filename.css" , for example. If the file is located in an above directory, ".."  is what you need to type to get to the parent directory of the current one. So if (for some reason) your files are organized like this:

htmlfiles2

Then what you need to type to get from you index.html file to your script.js file is:

 

That’s all, folks!

That’s it for today. We hope you got a vague idea of what to google if you want to learn about web development. This is a wonderful step on your journey towards becoming a proficient web developer – or, at least, towards understanding what the heck those proficient web developers are talking about. We’re going to dive a little deeper into the world of HTML, CSS and JavaScript in our next tutorials. You’re going to get to know some interesting libraries and tutorials on how to build webpages and interactive graphics yourself. So we hope you stick around for a while! In the meantime, feel free to comment or join our slack team if you have any questions or if you just want to say hi. See you soon!

Part 1

 

{Credits for the awesome featured image go to Phil Ninh}

HTML, CSS & JavaScript: The Basics (Part I)

HTML, CSS & JavaScript: The Basics (Part I)

Part II

Becoming a proficient web developer is hard — but understanding the basics isn’t. So this is what we’ll do today. By the end of this tutorial, you should have an idea of what people mean when they talk about HTML, CSS and JavaScript.

In this part, we’ll talk about the purpose of those three and learn a bit of basic HTML. In part two, we’ll learn a little more about CSS and JavaScript, especially about the use of JavaScript libraries, and how to combine all three to build a website. So let’s do this!

HTML, CSS and JavaScript are what most webpages are made of. They work together, each with a specific role. While you write a webpage, it will basically just look like a text document. You could write it in any text editor of your choice, although it’s probably a good idea to use a code editor like Atom or Sublime Text that will help you with formatting and code completion.

Once you’ve written the page, your browser is where the magic happens. It interprets the code you wrote in your .html, .css and .js text files and constructs a fully functioning, beautiful website from it – given there’s no errors in your code, of course. But that’s what we’re here for today.

HTML, or HyperText Markup Language, is the backbone of any webpage. It determines its basic structure and content. However, if you only use HTML, your webpage will end up looking like the very first pages from the early days of the web. No pretty formatting, no interactive elements. You’ll see it sometimes when your internet connection is very slow and your browser won’t load the page properly. It’s okay, but we can to better. That’s what CSS is for.

CSS is short for Cascading Style Sheets. This is the code that tells your browser how your webpage should look: What fonts and colors to use, what size your text should be etc.. But a pretty-looking website is not all you could want. You might want to interact with the webpage, click things, move stuff around and have it respond to your actions. HTML and CSS can’t do that very well, but JavaScript can.

JavaScript tells your browser how the page should behave. It’s what you write interactive graphics with, or pretty much any interactive element on your website. Compared to the other two, it’s a pretty complex language that you probably won’t learn entirely in a week or two. But that’s okay. For now, it’s enough to know what it does.

So if you only remember one thing from this tutorial, let it be this summary of what makes up a webpage:

HTML: Strucure/content
CSS: Style
JavaScript: Behaviour

If your mind has a little more room left right now, then let’s take a closer look at how to build a webpage. As mentioned, webpages are nothing but text documents that your browser knows how to interpret. To write these documents, you can use any code editor of your choice. I use Atom, which works very nicely as long as you don’t try to view giant datasets.

We’re going to look at some useful commands in this tutorial. But of course, there’s lots more to discover. If you want to look up more stuff, W3schools is a good place to start. You’ll find references for HTML, CSS, JavaScript and more. There are also some good MOOCs (Massive Open Online Courses) for web development. Take a look over at Coursera or CodeAcademy, for example.

It’s always a good idea to look at the code of other webpages and see how they work. To do so, right click anywhere on a website. The details are different for each browser, but there will be an option like “inspect element” and “show source code”. If you click it, you should see a lot of code. This is the website how your browser sees it. Try to understand what’s going on. This is a good source of inspiration for your own pages as well.

While coding your own projects, it might also be able to take a look at the console of your browser. You can find it by clicking “inspect element” (or whatever corresponds to that in your browser) and choosing the tab named “console” in the window that opens. The console will show you an error message if something isn’t right with your page. That message can help you identify the problem, or at least give you something to google if you don’t understand the problem right away.

But you’ll get to that later. First, let’s get to know the basics.

HTML

Let’s start with some HTML code. If you want to code along, open up a new document in the code editor of your choice. Copy the code below into the file and save it as, for example, index.html (the .html part is the important one).

This is the structure each HTML document follows. The first thing you’ll notice is lots of angle bracket action. The words in between the angle brackets are keywords we call tag names. Together with the bracket, they make up a tag that serves a specific function. HTML content is always placed between a start tag and an end tag that begins with a slash, like this:

Most tags accept attributes that specify how they should behave:

We’ll get to know some examples of that soon.

Any HTML document starts with the declaration <!DOCTYPE html> that tells the browser what kind of document to expect (html in this case, duh). Then, there’s the <html>  tag. It doesn’t close until the very end of the page and makes it extra clear that anything in between will, in fact, be HTML code.

The two basic parts of your page are the <head>  and the <body> . The head contains all the meta information you want or need to make your website function properly. It can be used, for example, to give your website a name ( <title>), load necessary CSS information ( <style>) or JavaScript sources ( <script>), which we’ll get to later on. You can also set lots of other stuff like the language or the character encoding in the head.

Anything in the head will not be directly visible on the webpage, so if you want to finally add some content, take a look at the body of your page. This is where you put your actual content. There’s quite a few options as to how to write your text and what media to incorporate into your page. Let’s take a quick look at a few important tags for that. If you want, copy them into your freshly made HTML document and play around with them a little. To see what they do, just save the file and open it in your browser (you can leave it open in your code editor as well, of course.

 

Formatting
  • <p>  stands for “paragraph”. Use it to write some important stuff onto your website, and don’t forget to close your paragraph with </p> . End tags are important for almost all the tags we’ll discuss here.
  • <h1> ... <h6>  are headings. The higher the number, the smaller the font of your heading will be.
  • <b>  or <strong>  ist used to use bold print like this for your text. Both commands do essentially the same thing.
  • <i>  or <em>  are both used to write in cursive.
  • <u> underlines text, but you can forget about that one instantly. On the web, it’s pretty much never a good idea to underline text that’s not a link, and those are underlined by default.
  • <br>  creates a line break. This is one of very few so-called void tags that don’t need a closing tag.
Lists
  • <ul>  creates an unordered list like the one you’re reading right now.
  • <ol>  creates an ordered list. It can be modified with some attributes. For example,  <ol start=343 reversed>  will start the list at number 343 instead of 1 and count down from there instead of up. Here you can find a list of all possible attributes.
  • <li>  creates a list element to populate the unordered or ordered lists you created.
  • Your shopping list in HTML might look something like this:
Media
  • Links: <a href=„URL“ target=_blank“>Link text</a> The “a” stands for anchor, and is used to create a link. Put the URL you want to refer to in the href attribute. target=”_blank” makes the link open in a new tab. If you leave it out, it will default to opening the link in the current tab.
  • Images: <img src=“file path/URL“ height=“550“ width=“100%> The img tag is another void tag, so you don’t need to type </img>  after it. Try experimenting with the width and height settings. If you specify them as just a number, the unit will default to pixels. By specifying percentages, you can scale the image according to the screen it’s viewed on. With the above specifications, for example, our picture will the entire width of our screen and be 550 pixels high. Note that if you specify both width and height, you might skew the proportions of your image. That is why sometimes, it’s better to only specify either the width or the height settings.
  • Audio: <audio src=“file path/URL“ controls autoplay loop> Sorry, your browser can’t handle this </audio> The audio tag lets you incorporate any audio file into your website. It’s probably a good idea to type in controls as an attribute so the user can pause and unpause the audio. Autoplay and loop are pretty self-explanatory. Better think twice about whether you actually want your audio to play instantly and on a loop, though. The text between the opening and closing tag of this line is only displayed if the browser can’t load the audio file to inform users that there should, in fact, be something here.
  • Video: <video src=“file path/URL“ controls autoplay loop> Sorry, your browser can’t handle this </video>  The video tag works pretty much the same as the audio tag, just, well, with videos. You can also specify width and height like with an image.

Inline Frames

An Inline frame, or iframe for short, works pretty much like the video ot audio tags, but it’s much more powerful. It let’s you display any content, even a whole other website, inside the frame you specify. It’s kind of like a window to another page. It’s very useful for easily displaying graphics or web apps and such, but it can be a bit tricky to make it properly responsive. So if you’ve got the chance, always try to natively integrate your pretty interactive graphics into your webpage. But if you don’t have the time or resources, an iframe will do just fine. Take a look at its possibly attributes here.

Structure
  • There’s some HTML tags that don’t appear to be doing anything on first sight. They’re structural elements designed to let you style the layout of your page.
  • <div>content</div>  A div, short for division, is one of those elements. It’s used to define a separate section in your HTML document. If you don’t use CSS or JavaScript to tell the browser what to do with it, you won’t even notice it’s there. But if used properly, divs can be a great layout tool.
  • <span>  is like the divs little sibling. While a div can be used to style entire sections of a text, move them or give them a specific format, <span> is more commonly used for styling smaller elements, like if you want to give part of your sentence a different color.

If you’ve followed this tutorial and tried out some of the options we’ve discussed so far, you should have something that resembles a website from the early 2000s by now. That’s pretty cool. Let’s see if we can make it even cooler. If you want to learn about styling your website with CSS and making it interactive with JavaScript, please continue with part two of our tutorial. See you there!

Part II

{Credits for the awesome featured image go to Phil Ninh}

R: Your first web application with shiny

R: Your first web application with shiny

Data driven journalism doesn’t necessarily involve user interaction. The analysis and its results may be enough to write a dashing article without ever mentioning a number. But let’s face it: We love to interact with data visualizations! To build those, some basic knowledge of JavaScript and HTML is usually required.
What? Your only coding skills are a bit of R? No problemo! What if I told you there was a way to interactively show users your most interesting R-results in a fancy web app?

Shiny to the rescue

Shiny is a highly customizable web application framework that turns your analysis into an interactive web app. No HTML, no JavaScript, no CSS required — although you can use it to expand your app. Also, the layout is responsive (although it’s not perfect for every phone).

In this tutorial, we will learn step by step how to code the shiny app on Germany’s air pollutants emissions that you can see below.