[okfn-labs] Public bodies and organisation identifiers

Rufus Pollock rufus.pollock at okfn.org
Fri Jul 5 17:22:42 UTC 2013


Hi Mark,

This is great, thanks for writing up. I can add some comments in the
issue https://github.com/okfn/publicbodies/issues/41 but also some
comments inline below.

Rufus

On 5 July 2013 13:06, Mark Brough <mark.brough at publishwhatyoufund.org> wrote:
> Hi guys,
>
> This is an idea that I've been thinking about for a while. I discussed it with Rufus a week or so ago and wanted to share it with the list to see what everyone thinks.
>
> The short version: could public bodies be used to generate usable organisation identifiers?

I think it most definitely could be. The question is how formal that
"generation" process is.

[...]

> Background
> ==========
[...]

> For multilaterals, we use the following methodology:
> [OECD/DAC Channel code]
> e.g. for the World Bank's International Development Association (IDA):
> 44002

This set seem reasonably well sorted out. By the way is there an
existing list of those orgs for which there *are* already codes?

> Problems
> ========
> Agency codes
> * Agency codes only include donor agencies. So the Ministry of Finance in Botswana, for example, does not have a code.
> * Agency codes don't even include all donor agencies: for example, parts of the European Commission or the United States, even though they give aid, don't have their own identifier - they're categorised under "Miscellaneous".
> * The process for adding new agency codes is slow (even if it took a day, that might be too long)

One might want a "user-space" versus "approved-space" set of codes.

> Channel codes
> * Channel codes only contain a subset of all of the multilateral / international / intergovernmental organisations in the world, and many of them are not listed in a very usable way. For example, the World Health Organisation has two codes:
> a) World Health Organisation - core voluntary contributions account
> b) World Health Organisation - assessed contributions
> --> but there isn't one for just "World Health Organisation", for example if you're contracting them to deliver a project.
>
> Many organisations publishing IATI data will therefore struggle to provide unique organisation identifiers for many of the public sector / international organisations that they are working with

Good examples.

> Rationale
> ========
> * Official lists of organisations should be used if possible.
> * Official lists of organisations don't exist in most cases.
> * The exact identifier assigned to an organisation is not fundamentally important (whether it's BW-1 or BW-21, the Botswana Ministry of Finance just needs a code).
> * Organisation identifiers should be cross-mapped to other codes / identifiers for those organisations so that the data is easily interoperable.
>
> Proposal:
> ========
> Fuzzy reconciliation / text matching of organisations, with an API that assigns an existing identifier where available, and creates a new one where it's not available
>
> 1) Organisations (initially, preferably those with a large amount of data) throw four key pieces of data at the API:
> * organisation name (text) - e.g. MINISTRY OF FINANCE
> * organisation country (code) - e.g. BW (for Botswana)
> * language (code) - e.g. en
> * last recorded transaction with this organisation (date) - e.g. 2013-07-05
>
> 2) the API responds with one of the following (possibly using HTTP status codes?):
> a) Organisation found => use code "BW-1"
> b) Organisation not found => created code "BW-21"
>
>    it also stores the data about the last recorded transaction, so that other people know that that organisation *may have* existed on that date.

Sounds good but this adds some complexity and makes it quite bespoke ;-)

> Another source could be Charts of Accounts, existing lists (like those that exist on PB already), budget documents, and structured spending data, e.g. from OpenSpending.
>
> Dealing with duplicates:
> ========================
> This will probably lead to some duplicates being created. There could be some manual reconciliation for this. Organisations could have a primary identifier and several secondary identifiers that were used by duplicate organisations..
>
> Dealing with changing organisations:
> ====================================
> Organisations can be created / deleted / merged in the real world. This should probably lead to:
> a) created - a new identifier gets created;
> b) merged - a new identifier gets created for the new organisation; and (manually) the old organisations are linked / related to the new organisation;
> c) deleted - the identifier continues to exist, because old (and possibly future) data will still refer to it. However, it should be (manually) marked as no longer existing, pointing to a successor organisation of one exists (with some flag to explain whether it's a wholly .

This is definitely the most complex bit.




More information about the okfn-labs mailing list