[OpenSpending] Getting Greater London Authority spending into OpenSpending - an Update

Lucy Chambers lucy.chambers at okfn.org
Wed Apr 10 19:16:56 UTC 2013


Thank you, Rufus - please keep us updated if you discover anything
interesting :)


On 3 April 2013 21:44, Rufus Pollock <rufus.pollock at okfn.org> wrote:

> Hi All,
>
> I've been working to get Greater London Authority data into OpenSpending
> (as mentioned in the mail last week [1]). I'm doing this motivated by a
> basic question:
>
> *Which companies got paid the most (and for doing what)? *(OS thingstodo
> issue <https://github.com/openspending/thingstodo/issues/5>)
>
> I wanted to share where I'm up to and some of the experience so far as I
> think these can inform our wider efforts - and illustrate why this is
> challenging.
>
> First off, I'm keeping the code and README for this work here in a repo on
> github: https://github.com/rgrp/dataset-gla
>
> *## Data Quality Issues*
>
> This will be a familiar lament to many (more on all of this in the readme<https://github.com/rgrp/dataset-gla#readme>
> )
>
> There are 61 CSV files as of March 2013 (a list can be found in
> scrape.json <https://github.com/rgrp/dataset-gla/blob/master/scrape.json>
> ).
>
> Unfortunately the "format" varies substantially across files (even though
> they are all CSV!) which makes using this data real pain. Some examples:
>
> * no of fields and there names vary across files (e.g. SAP Document no vs
> Document no)
> * number of blank columns or blank lines (some files have no blank lines
> (good!), many have blank lines plus some metadata etc etc)
> * There is also at least one "bad" file which looks to be an excel file
> saved as CSV
> * Amounts are frequently formatted with "," making them appear as strings
> to computers.
> * Dates vary substantially in format e.g. "16 Mar 2011", "21.01.2011" etc
> * No unique transaction number (possibly document number)
>
> They also switched from monthly reporting to period reporting (where there
> are 13 periods of approx 28d each).
>
> *## Progress so far*
>
> I do have one month loaded (Jan 2013) with a nice breakdown by
> "Expenditure Account":
>
>
> Due to the data wrangling issues so far I have not got all the data
> loaded. What I have done is:
>
> - Archived all the data here (in case it gets moved)
> -
>
> [1]: http://lists.okfn.org/pipermail/openspending/2013-March/001664.html
>
> _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>
>


-- 
*Project Coordinator*
School of Data <http://schoolofdata.org/> and
OpenSpending <http://openspending.org/>
Projects of the Open Knowledge Foundation <http://okfn.org/>
Support our work <http://okfn.org/support/>.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending/attachments/20130410/20e172a6/attachment.html>


More information about the openspending mailing list