[OpenSpending] Getting Greater London Authority spending into OpenSpending - an Update

Rufus Pollock rufus.pollock at okfn.org
Wed Apr 3 20:44:14 UTC 2013

Hi All,

I've been working to get Greater London Authority data into OpenSpending
(as mentioned in the mail last week [1]). I'm doing this motivated by a
basic question:

*Which companies got paid the most (and for doing what)? *(OS thingstodo
issue <https://github.com/openspending/thingstodo/issues/5>)

I wanted to share where I'm up to and some of the experience so far as I
think these can inform our wider efforts - and illustrate why this is

First off, I'm keeping the code and README for this work here in a repo on
github: https://github.com/rgrp/dataset-gla

*## Data Quality Issues*

This will be a familiar lament to many (more on all of this in the

There are 61 CSV files as of March 2013 (a list can be found in

Unfortunately the "format" varies substantially across files (even though
they are all CSV!) which makes using this data real pain. Some examples:

* no of fields and there names vary across files (e.g. SAP Document no vs
Document no)
* number of blank columns or blank lines (some files have no blank lines
(good!), many have blank lines plus some metadata etc etc)
* There is also at least one "bad" file which looks to be an excel file
saved as CSV
* Amounts are frequently formatted with "," making them appear as strings
to computers.
* Dates vary substantially in format e.g. "16 Mar 2011", "21.01.2011" etc
* No unique transaction number (possibly document number)

They also switched from monthly reporting to period reporting (where there
are 13 periods of approx 28d each).

*## Progress so far*

I do have one month loaded (Jan 2013) with a nice breakdown by "Expenditure

Due to the data wrangling issues so far I have not got all the data loaded.
What I have done is:

- Archived all the data here (in case it gets moved)

