[OpenSpending] Getting Greater London Authority spending into OpenSpending - an Update

Rufus Pollock rufus.pollock at okfn.org
Wed Apr 3 20:53:07 UTC 2013

Hi All (apologies if you just got a similar email - send may been early

TL;DR: http://openspending.org/gb-local-gla and the

I've been working to get Greater London Authority data into OpenSpending
(as mentioned in the mail last week [1]). I'm doing this motivated by a
basic question:

*Which companies got paid the most (and for doing what)? *(OS thingstodo
issue <https://github.com/openspending/thingstodo/issues/5>)

I wanted to share where I'm up to and some of the experience so far as I
think these can inform our wider efforts - and illustrate the challenges
just getting and cleaning up data.

*## Data Quality Issues*

First off, I'm keeping the code and
README<https://github.com/rgrp/dataset-gla#readme>for this work here
in a repo on github:

There are 61 CSV files as of March 2013 (a list can be found in

Unfortunately the "format" varies substantially across files (even though
they are all CSV!) which makes using this data real pain. Some examples:

* no of fields and there names vary across files (e.g. SAP Document no vs
Document no)
* number of blank columns or blank lines (some files have no blank lines
(good!), many have blank lines plus some metadata etc etc)
* There is also at least one "bad" file which looks to be an excel file
saved as CSV
* Amounts are frequently formatted with "," making them appear as strings
to computers.
* Dates vary substantially in format e.g. "16 Mar 2011", "21.01.2011" etc
* No unique transaction number (possibly document number)

They also switched from monthly reporting to period reporting (where there
are 13 periods of approx 28d each).

*## Progress so far*

I do have one month loaded (Jan 2013) with a nice breakdown by "Expenditure


Interestingly after some fairly standard grants to other bodies, "Claim
comes in as the biggest item at £2.3m

- Data getting archived at
- Clean up script<https://github.com/rgrp/dataset-gla/blob/master/scripts/process.js>



[1]: http://lists.okfn.org/pipermail/openspending/2013-March/001664.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending/attachments/20130403/9a65358c/attachment.html>

More information about the openspending mailing list