Open Refine

by Kira Schacht 0 Comments
Open Refine

OpenRefine is your new best friend for data cleaning. It does most of the stuff Excel does, just… better.

Formerly called Google Refine, OpenRefine is “a powerful tool for working with messy data”, as they claim on their website. It’s mostly pointing and clicking and doesn’t require coding skills, although you’ll only discover the true potential of the program once you’re comfortable using regular expressions and typing simple search-and-replace commands yourself. But don’t worry, there’s tutorials on all the functions in their User Documentation.
Importing and exporting data in OpenRefine is easy, has lots of options to accommodate your data structure and saves you the encoding mess you often get when working in Excel.
OpenRefine can filter, sort, subset and transform data in any way your heart desires. It helps you view and clean data categories by hand and even offers Clustering algorithms to automatically find similar spellings and the like.
Those functions may be difficult to find at first, but once you understand the structure of the program, it’ll be a breeze, we promise.
For the more tech-savvy among you: OpenRefine can interpret and parse JSON output from URLs, so it can be used to interact with many APIs, which means it can do all kinds of stuff, for example geocode locations via the Google Maps API.

There are a few things that aren’t intuitive in OpenRefine, though. It works mostly on your existing records, so copying and pasting rows, although possible, is not easy. Same goes for building summaries of your data. Those things are better done in a different program – once you scrubbed your dataset squeaky clean in Refine.

Leave a reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.