Refresh

Adrià Mercader - Blog

Mapping library closures: a tale of budget cuts, civic hacking and [Open]Data

# Posted on Sunday, 27th January 2013 , under Open Data Leaflet Python Newcastle

First things first, Newcastle City Council plans to close down 10 libraries across the city as part of a wider series of budget cuts. This represents the closure of more than half of all existing branches and will cause considerable harm, apart from being just silly.

If you are concerned about it, head to SaveNewcastleLibraries.org to find ways to get involved, or click on the image below to open a map showing the location of the planned closures, as well as the contact details for their local councillors.

The rest of this post explains the story behind the data used to build this map.

A call for action

A while ago I learnt about the mentioned plans that Newcastle City Council had for the local libraries via one of its tweets. Closing libraries is not a nice thing to do, so I was immediately curious about the extent of the proposal: would my local branch be affected?, how many branches across the city would close?

Buried deep beneath the linked survey website there was a link to a document with the Library Network Reduction Proposal (pdf, 115 Kb), which you have to trawl to actually get the names of the affected libraries. These turned out to be 10 out of 18 existing branches, some of which I didn't know where were located, so my immediate reaction was to create a map with them. What if I also added the contact details from the local councillors for each branch so people could raise their concerns more easily?

That was it then, get the libraries locations and local councillors, a bit of processing to add the planned status to each library, put together a nice Leaflet / Mapbox combo and we are all set, right? Right?

Sadly it was not going to be that easy.

Looking for available data

The first port of call for spatial developers looking for free and open geodata is probably OpenStreetMap. Unfortunately, the library POIs for Newcastle on OSM were incomplete and included libraries which didn't belong to the Council, so that wasn't going to be an option.

Contrary to their Sunderland neighbours, the Newcastle City Council website does not have a public downloads page with Open Data datasets (In this sense, mackems are well ahead of geordies). The closest I found was a page with a Bing map showing libraries locations. It looked like it would be scrapping after all.

Now, a list of councillors and their email addressesses should be easy to find on any Council web site. Newcastle City Council one indeed has a list of all councillors with their contact details, even comma separated! The fact that the list is printed on the page and not on a separate linked CSV file would be only mildly irritating if it wasn't for the fact that this list does not show which ward the councillors belong to, which is a quite important piece of information to miss.

The only way of getting the ward a coucillor belongs to is via a web form, so this one also looked like it was going to be copy & paste.

Structured data is born

Michelangelo is quoted as saying that inside every block of marble there was a statue awaiting to be freed. Similarly, inside every dodgy web page from an organization there is a nice clean dataset awaiting to be created.

In these first stages, data wranglers need to get their hands dirty, copy and pasting or scrapping data into their favourite editors, doing some regexp wizardry and manually editing the data. The result is a file that can be used by a computer program to display its contents, transform its format, combine it with other files, etc.

In my case, I ended up with the libraries names and locations (plus plans according to the reduction proposal) in CSV and GeoJSON format, which is very convenient for mapping applications.

#libraries.csv
name,plan,due,longitude,latitude
"Moorside Library",close,"June 2013",-1.635681,54.976958
"Kenton Library",keep,,-1.653120,55.005594
[...]

#libraries.geojson
{"type":"FeatureCollection",
"features":[
{"type":"Feature","properties":{"name":"Moorside Library","plan":"close","due":"June2013"},"geometry":{"type":"Point","coordinates":[-1.635681,54.976958]}},
{"type":"Feature","properties":{"name":"Kenton Library","plan":"keep","due":""},"geometry":{"type":"Point","coordinates":[-1.653120,55.005594]}},
//[...]

On the other side, thanks to the Coucil website, I had two different CSV for the councillors, one with the contact details of each councillor and another with their ward and party.

So what needed to be done was:

  1. Find out on which ward was located each library.
  2. Merge the two councillors files to have a single file with the ward and email of each councillor.
  3. Assign the relevant councillors to each library depending on the ward they were located.

To which ward belongs library X?

The basic operation that we need to perform is knowing the location of a point (a library), extract an attribute from a polygon layer that contains it (a ward). People familiar with GIS will probably think about doing a spatial join, but in my case I didn't have the wards polygon layer, and if there is a suitable API available there is a quicker way if you don't mind putting together a quick script.

This API is the excellent MapIt by MySociety. It basically returns administrative areas for any given point or postcode (in the UK) which is exactly what we need here. A request looks like this one, note the latitude and longitude at the end:

http://mapit.mysociety.org/point/4326/-1.635681,54.976958

This Python script reads the libraries GeoJSON file, loops through each library and queries the MapIt API for its coordinates, storing the result back in a new GeoJSON file:

import json
import requests

query_by_point_url = 'http://mapit.mysociety.org/point/4326/{lon},{lat}'

with open('libraries.geojson', 'r') as f:
    libraries = json.loads(f.read())

for library in libraries['features']:
    url = query_by_point_url.format(
            lon=library['geometry']['coordinates'][0],
            lat=library['geometry']['coordinates'][1]
            )

    areas = requests.get(url).json

    for key, area in areas.iteritems():
        if area['type'] == 'MTW':   # We want Metropolitan district wards
            library['properties']['ward_id'] = key
            library['properties']['ward_name'] = area['name']
            break

with open('libraries_with_wards.geojson', 'w') as f:
    f.write(json.dumps(libraries))

As a small aside, it is worth mentioning that the MapIt API is also a great way of obtaining the geometries of administrative areas, which are already returned as GeoJSON. For instance, this script will get all ward geometries for Newcastle, but it should be trivial to adapt to any other area.

Wards

Cleaning up and merging CSV files

For cleaning and working with CSV files or spreadsheets, OpenRefine (The Software Formerly Known As Google Refine) is extremely useful. It is definitely very recommended for anyone looking for a good tool for working with data, and there are several good resources across the web to help you getting started.

Wards

OpenRefine allows you to perform basic cleaning up operations, for instance:

  • Finding blank cells (Select column menu > Facet > Customized facets > Facet by blank)
  • Trim spaces (Select column menu > Edit cells > Common transforms > Trim leading and trailing whitespace)

But it also supports more advanced stuff, like importing a column from another file which can be joined by a common column. In my case, I needed to import the "Email" column from the contact details file into the one that had the ward names, and the common column was the councillor name.

  • Load both files into OpenRefine (in my case "councillors wards csv" and "councillors address csv"
  • On the wards one, on the column "name" > Edit column > Add column based on this column...
  • The cross function needs to be used: cell.cross("councillors address csv","Name").cells["Email"].value[0]

Done. The resulting file can be exported again as CSV using the option on the top right side of the page.

Joining libraries and councillors via wards

Once the libraries and councillors datasets were ready, they only needed to be joined via the common field, the ward name. I used another small Python script for this:

import csv
import json

with open('libraries_with_wards.geojson', 'r') as f:
    libraries = json.loads(f.read())

councillors = []
with open('councillors-emails.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        councillors.append(row)

def get_councillors(ward_name):
    found = []
    for councillor in councillors:
        if ward_name == councillor['ward']:
            found.append(councillor)
    return found

for library in libraries['features']:

    library['properties']['councillors'] = get_councillors(library['properties']['ward_name'])

with open('libraries_with_councillors.geojson', 'w') as f:
    f.write(json.dumps(libraries))

The end result was a GeoJSON file which, apart from the location information, contained data about the ward and the councillors for each library, just what was needed for the web map.

Of course, there are many ways in which a dataset like this one could have been produced, for example using only Python scripts or libraries like pandas, or doing the whole process in OpenRefine. (also, given enough time, there is always the hard way of copying and pasting things on a spreadsheet).

Telling a story with data

Developers used to work with data may find some pleasure in nice and cleaned up datasets per se, but most people won't share our excitement at the sight of a CSV file or a JSON API. Data needs to visualized and used in a way that gives meaning to it.

A list of library names in a PDF does not tell you much. Even a list of libraries and the plans for them in a spreadsheet is not likely to have a big impact on you. But map these libraries across the city, clearly flagging the proposed closures, and the scale of the proposal starts becoming apparent. Put the contact details of the local councillors that represent you just one click away from the library that is meant to be closed and maybe people would find easier to raise their concerns.

This is one of the promises and goals of Open Data. Make easier for data to find its way to citizens and have an impact on their lives.

Visit the Proposed Library Closures map here: http://amercader.github.com/savenewcastlelibraries/


All datasets and scripts mentioned on this post are available on GitHub and the DataHub:

Import RunKeeper data to PostGIS and visualize it with Torque

# Posted on Thursday, 15th November 2012 , under PostGIS CartoDB

After a long day in front of your laptop or just to think of a fresh approach to that annoying bug, what's better than going for a run?

There are amazing examples out there of the kind of stuff you can do with your running data, and more and more readily available data visualization tools open a lot of possibilities.

To make easier to get started hacking with your running data to those of you who use RunKeeper (and to celebrate PostGIS day!), here is a small script that will import your data into PostGIS.

Getting the data

One thing I like about RunKeeper is that it does not lock your data and makes really easy to download it. Just head to the export data page and select the date range you prefer.

RunKeeper Export

This will generate a zip file which contains two CSV files and as many GPX files as activities you have. The cardioActivities.csv file contains information about each run, such as date, total distance, average pace, etc., and crucially, the name of the GPX file which contains the track of the run. Our aim is to get this file exported as a table with all the data fields and a geometry column that contains the actual spatial data.

Running the script

You will obviously need PostgreSQL and PostGIS installed, and we are also going to use ogr2ogr to import the geometries.

Extract the zip file and cd into it. Download the following script and tweak the configuration paramters to suit your needs (You may need to change the way psql is called depending on your Postgres authorization setup).

What next?

Once you have a nice structured table on PostGIS it can be easily loaded into a number of tools like TileMill or QGIS. One quick way of having a nice temporal visualization of your runs is using Torque, a library powered by CartoDB. If you upload your data to CartoDB you just have to enter your user name and table name to get a visualization of your runs.

RunKeeper Torque

Here are some quick steps on how to get it working:

  • Export your data to one of the formats supported by CartoDB. Here I'm using GeoJSON:

    
    ogr2ogr -f GeoJSON activities.geojson PG:"dbname='db_test' host='localhost' port='5432' user='XXXXXX' password='YYYYYY'" TABLE_NAME

    Important: for some reason the GeoJSON import only works on CartoDB if the file is zipped, so create a zip of your file.

  • Create an account on CartoDB if you don't already have one, sign in and select "Create a new table".

  • Select "I want to start with some imported data" and pick the zipped GeoJSON file.

  • If all went right, you should see your records on a table. Select "Change data type" on the date field header and choose Date.

  • Head to http://cartodb.github.com/torque, enter your user name and table name on the top right boxes and click on "Update Torque". That's it! You can play around with the controls to change the visualization.

Hopefully this gets you inspired to play with your own data and do some nice visualizations or hacks!

OpenLayers Cookbook review

# Posted on Monday, 29th October 2012 , under OpenLayers Books

I finally managed to set aside some time to read OpenLayers Cookbook, written by Antonio Santiago. Antonio is a great developer (you should definitely check his blog for some cool spatial development stuff) and a really nice guy, so I was looking forward to see how the book had turned up.

As the rest of the Packt Cookbook series, the book is based on increasingly complex small code "recipes" that focus on a particular aspect of OpenLayers development. The recipes are grouped in thematic chapters, starting with the Web Mapping Basics, which covers topics like base layers and overlays, map controls and extent.

Next chapters are related with Raster and Vector layers, showing some examples on how to load different layer types and data formats, and exploring concepts like Protocols and Strategies, which can be sometimes tricky for new users.

The book moves onto Events (how to listen to map, feature and DOM events) and Controls. The Controls chapter gives a really good overview on how to add them to the map and configure them to interact with the map and layers.

Next comes the Theming chapter, which goes as far as OpenLayers will allow you to change its default looks. The next one, Styling Features is particularly comprehensive and covers all aspects of working with vector features with really nice examples, one of the points were OpenLayers is really powerful.

The last chapter, Beyond the Basics, explores more complex stuff with a couple of nice examples of more advanced uses of OpenLayers. Creating custom controls is particularly useful once you start working on big OpenLayers based applications. Some of the other recipes are perhaps a bit specific but give a good introduction to some useful OpenLayers features.

In conclusion, the book is very comprehensive and well written, and a lot of effort has obviously gone into it. All the recipes are included as web examples that can be downloaded from the publisher's web and opened in a browser. Antonio is clearly an expert on the field and this shows on the code and the explanations around it. The book is more orientated to novice and beginner users, and more experienced OpenLayers developers will probably skim through some of the recipes. All in all, a great book for your first dive into OpenLayers.

OpenLayers Cookbook

Around the world in 80 Wikipedia articles

# Posted on Monday, 19th March 2012 , under WikiGeoLinks Wikipedia Leaflet Bootstrap

(For the eager ones, I've written a new version of WikiGeoLinks with Leaflet and Bootstrap. It lets you discover interactively how the georeferenced articles of the Wikipedia are connected. If you want the long story, scroll down...)

Around the world in 80 Wikipedia
articles
Click to enlarge

I'm sure one of the most common ways in which people use Wikipedia is jumping to different articles from the original one they were checking in the first place. At least that's how I spent most of the time on it, getting dragged into interesting stuff. This can be of course a great productivity killer, but I think this "courseless" browsing is very rewarding because you never know what you'll end up discovering.

I've also always liked this mental image of linked Wikipedia articles forming a huge network with clusters around popular articles, and sure there are countless visualizations all over the Internet that show at least at part of this mammoth graph.

But what about the spatial connections between Wikipedia articles? A huge number of articles have coordinates associated, and are obviously linked to other ones, so how would this network look like on a map? One could try to create a global, static visualization like the one that came up a while ago of Facebook connections, but as beautiful as these images are, they are not interactive. I wanted some form of the "mindless" browsing I mentioned before, that would allow discovering interactively what articles were located around a certain area and which other articles they linked to.

That's why a while ago I wrote a first version of WikiGeoLinks , a web application that let you locate Wikipedia articles on a map and see how were they connected. This first version was built with GeoExt and OpenLayers, and used a GeoJSON web service powered by MapFish. After locating an article on the map, its linked articles were shown and you could click on them to see their connections, carrying on for as long as you wanted.

Making it nicer

I was definitely happy with the original version, and it was a great opportunity to learn in depth how GeoExt works, but to be fair it didn't look that nice. Ext based applications have a very specific and valid use case, but I wanted the site to be lighter and more attractive, both visually and in the code.

The first step was switching the default OpenStreetMap tiles to the ones that have become my first choice in all my recent projects: MapQuest-OSM. They offer direct access to the tiles, which are OSM based and look gorgeous (specially at small scales), so what's not to love about it?

Leaflet Logo

I had played around a bit with Leaflet before, but this was a good opportunity to use it in a complete project, learn how it worked and see how it compared to OpenLayers, the mapping library I have more experience with. Praising Leaflet at this stage may seem unnecessary, but I'll do it anyway: it is a joy to work with it. Its API is clean, simple and intuitive. It is fast, easy to extend and ridiculously light-weight. If you are used to OpenLayers though, you may find some things confusing at first. I'll try to write another blog post going in more detail about this.

Finally, to glue it up all together I've used Bootstrap, which is quickly becoming the cornerstone of dozens of websites. It has turned out to be a great tool for online mapping applications, and the widgets it offers are easy to integrate as traditional map controls like toolbars or layer switchers.

What's next?

There are a number of things I would like to add or improve, here are a few of them:

  • The underlying data is outdated and does not include all georeferenced articles. I plan to use the more up to date data available at the Wikipedia-World project.

  • The GeoJSON service can be a little bit slow. It can probably be optimized in some ways (or maybe I need to upgrade my Linode...).

  • I would love to have some way to generate a permalink to a series of linked articles, so you can save your "routes".

  • It can definitely be made more mobile friendly, for instance avoiding hover events.

And some other, more vague ideas (some of them definitely pointless):

  • Maybe the Wikipedia API can be used to pull some more content about the articles in the popups (summary, pictures...).

  • Can you build a Wikipedia "route planner"? e.g can I go from La Paz to Matsushima via georeferenced Wikipedia articles?

  • How cool would it be to be able to do the same on a 3D Globe?

I hope you like it and spend some time wandering around. I would love to hear your comments, and also feel free to add any issues you may find to GitHub, where you can also find the source code.

2011 Wrap up

# Posted on Saturday, 24th December 2011 , under Meta Open Knowledge Foundation CKAN

Let's be completely honest, we all knew it was going to happen. You start a new blog (maybe you've even built the engine yourself), add some enthusiastic new posts, start publishing fairly regularly... And then at some point you start procrastinating a bit, telling yourself that next weekend you are going to write about this new thing you've found and without knowing it has been eight months since your last post.

Well, in my defence I have to say it has been a pretty busy year. Last March I started working full time for the Open Knowledge Foundation. The OKF is a not-for-profit devoted to promoting open access to knowledge and data, and it is involved in a ridiculous amount of awesome projects. Check for instance the annotated works of Shakespeare, OpenSpending, an effort to structure and visualize government budget spending, and one of my favourites, the Public Domain Review. You can find more active projects and mailing lists on the OKF website, and if you want to meet brilliant and talented people do not hesitate to get involved!

One of the main projects of the OKF, and the one I work full-time on, is CKAN, an open source data catalog platform. The aim of CKAN is to provide a robust and flexible platform for publishing, discovering and sharing data. It powers institutional catalogs like the British and Norwegian government open data catalogs, as well as community instances like The Data Hub or the ones maintained in different countries.

The first project I was involved with was the UK Location Programme. It is integrated with the mentioned Data.gov.uk (DGU), and it acts as the spatial data infrastucture for supporting the pan-european INSPIRE directive. A lot of work on this front was related with harvesting multiple CSW servers to aggregate their records in the DGU portal.

Another really interesting OKF project I worked on was Publicdata.eu, an experimental portal that aggregated open datasets from catalogs across Europe. My colleague Friedrich Lindenberg wrote a nice post presenting it, if you want to know more about it. I was even able to squeeze a couple of maps in there!

One of the projects I've been more involved recently is the IATI Registry, also powered by CKAN. IATI is an international initiative to promote transparency on aid spending. It provides a common standard on which funding organisations can publish their activities, allowing stakeholders and third-parties to analyze and visualize this spending data. The registry provides a centralized point where organisations can publish their IATI compliant files so users can access them via the web interface or the API.

And of course, apart from working on specific projects I've been involved in the usual day to day tasks in a big open source project like CKAN, building features, fixing bugs, supporting users... All of this with a brilliant (remote) team and learning an awful lot of stuff on the way!

So all in all, 2011 has been an intense and exciting year and it looks like 2012 won't dissapoint either, with new projects and features coming up.

Hopefully it won't take me months to tell you about them.

Happy New Year!