[wdmmg-dev] Making sense of the OpenSpending UK Govt 25k data dump

Nick Stenning nick at whiteink.com
Sat Sep 17 13:18:29 UTC 2011


> On Fri, Sep 16, 2011 at 10:43 AM, Chris Taggart <countculture at gmail.com>
> wrote:
>
> I've stuck the JSON/CSV download problem in as issue #211

Just to say I'm looking into this right now, although am having
difficulty replicating it locally. Will let you know what I find.

>> 1) It's not clear (to me) which is the OpenSpending primary ID for the
>> transaction, and how to go from that to the URL of the transaction on the
>> OpenSpending Page

Transactions have an primary id field of "_id", which is provided in
the JSON and CSV downloads for that transaction (or "entry").

The URL for a given transaction is simply http://openspending.org/entry/<_id>

I could tell you the gory details of how this _id is computed from the
source data, but I reckon that if _id isn't serving its purpose for
you, we should fix that rather than causing others to rely on our
implementation details. Do let me/the list know if any of this isn't
clear.

>> 2) The department names are a bit of a mess.

Yes. I'm not clear on whether the normalisation of these names was
done better in the data that was loaded until last week, although I
have a feeling it might have been.

There are two issues here, one short term and one long term:

1) We need to clean up the 25k data properly with a session in Google
Refine, and probably reload it.

2) We are discussing this issue of reconciliation and its role in
OpenSpending frequently in IRC (#openspending on Freenode). At the
moment, we think we need to clearly separate the concepts of
"dataset-local" entities (of which there could be "Department for
Media, Culture and Sport" and "DCMS" within the same dataset!) and
"global" entities. We would then provide API and tools for drawing the
links between these "local" entities and a unique "global" entity. All
API calls to the dataset would then provide access to the normalised
data and the "global" entity.

>> 3) There are lots of entries which are not departments, and some entities
>> that are local govt (which means we'd duplicate with the OpenlyLocal data
>> we're importing).

This I can't speak to, but ultimately this is about the data cleanup
that (at the moment at least) needs to happen *before* the data goes
into OpenSpending.

>> [it] seems crazy for OpenCorporates to duplicate the
>> OpenSpending stuff, or spend time cleaning it up and in the process making
>> it impossible to link back to the correct OpenSpending entry.

I certainly agree on that point!

Let's keep talking about these issues and we'll do our best to fix as
many of them as we can in the short term, and provide tools for fixing
them in the longer term.

Best wishes,
Nick




More information about the openspending-dev mailing list