[okfn-labs] Public bodies and organisation identifiers
tfmorris at gmail.com
Fri Jul 5 15:52:39 UTC 2013
A few random comments:
- You probably want to make it a little harder to create a new entity than
just a single failed lookup
- If you have a Merge function, you're eventually going to need a Split to
undo the bad merges or the conflated organizations
- You might want to look at Freebase since it's got:
- split, merge, delete flagging with community voting on the flags built
- schema for non-profit organizations and government agencies with
successors, predecessors, alternate names, start/end dates, etc
- an existing DB of 13K non-profits and 5K government agencies that can
be used as a starting point (most linked to Wikipedia, many linked to
CorpWatch/SEC IDs and other strong identifiers)
- it's world writable and openly licensed
- it's got a reconciliation service available as well as an autocomplete
widget for simple lookups
On Fri, Jul 5, 2013 at 8:06 AM, Mark Brough <
mark.brough at publishwhatyoufund.org> wrote:
> Hi guys,
> This is an idea that I've been thinking about for a while. I discussed it
> with Rufus a week or so ago and wanted to share it with the list to see
> what everyone thinks.
> The short version: could public bodies be used to generate usable
> organisation identifiers?
> Pasted below, but also submitted as an issue here:
> Look forward to hearing your thoughts...
> The IATI Standard is an XML based format for sharing detailed information
> about aid projects. Fundamentally, the model shows resource flows from one
> organisation to another, with various classifications in between and many
> financial transactions as part of each project. So like this:
> activity (DFID -> World Health Organisation)
> - transaction (GBP 500 disbursed on 2013-05-01)
> - transaction (GBP 500 disbursed on 2013-07-05)
> For the private sector and NGOs, the methodology for uniquely identifying
> organisations is:
> [Jurisdiction]-[National registration body]-[Number]
> e.g. for Oxfam GB, registered at the Charity Commission, with reg number
> For governments, the following methodology is used:
> [Jurisdiction]-[OECD/DAC Agency code]
> e.g. for the UK's Department for International Development:
> For multilaterals, we use the following methodology:
> [OECD/DAC Channel code]
> e.g. for the World Bank's International Development Association (IDA):
> Agency codes
> * Agency codes only include donor agencies. So the Ministry of Finance in
> Botswana, for example, does not have a code.
> * Agency codes don't even include all donor agencies: for example, parts
> of the European Commission or the United States, even though they give aid,
> don't have their own identifier - they're categorised under "Miscellaneous".
> * The process for adding new agency codes is slow (even if it took a day,
> that might be too long)
> Channel codes
> * Channel codes only contain a subset of all of the multilateral /
> international / intergovernmental organisations in the world, and many of
> them are not listed in a very usable way. For example, the World Health
> Organisation has two codes:
> a) World Health Organisation - core voluntary contributions account
> b) World Health Organisation - assessed contributions
> --> but there isn't one for just "World Health Organisation", for example
> if you're contracting them to deliver a project.
> Many organisations publishing IATI data will therefore struggle to provide
> unique organisation identifiers for many of the public sector /
> international organisations that they are working with
> * Official lists of organisations should be used if possible.
> * Official lists of organisations don't exist in most cases.
> * The exact identifier assigned to an organisation is not fundamentally
> important (whether it's BW-1 or BW-21, the Botswana Ministry of Finance
> just needs a code).
> * Organisation identifiers should be cross-mapped to other codes /
> identifiers for those organisations so that the data is easily
> Fuzzy reconciliation / text matching of organisations, with an API that
> assigns an existing identifier where available, and creates a new one where
> it's not available
> 1) Organisations (initially, preferably those with a large amount of data)
> throw four key pieces of data at the API:
> * organisation name (text) - e.g. MINISTRY OF FINANCE
> * organisation country (code) - e.g. BW (for Botswana)
> * language (code) - e.g. en
> * last recorded transaction with this organisation (date) - e.g. 2013-07-05
> 2) the API responds with one of the following (possibly using HTTP status
> a) Organisation found => use code "BW-1"
> b) Organisation not found => created code "BW-21"
> it also stores the data about the last recorded transaction, so that
> other people know that that organisation *may have* existed on that date.
> Another source could be Charts of Accounts, existing lists (like those
> that exist on PB already), budget documents, and structured spending data,
> e.g. from OpenSpending.
> Dealing with duplicates:
> This will probably lead to some duplicates being created. There could be
> some manual reconciliation for this. Organisations could have a primary
> identifier and several secondary identifiers that were used by duplicate
> Dealing with changing organisations:
> Organisations can be created / deleted / merged in the real world. This
> should probably lead to:
> a) created - a new identifier gets created;
> b) merged - a new identifier gets created for the new organisation; and
> (manually) the old organisations are linked / related to the new
> c) deleted - the identifier continues to exist, because old (and possibly
> future) data will still refer to it. However, it should be (manually)
> marked as no longer existing, pointing to a successor organisation of one
> exists (with some flag to explain whether it's a wholly .
> 1) Does this sound sensible? Is it a good idea? Is there a better
> 2) Will the fuzzy matching be accurate enough to be useful? Is it likely
> to assign organisations an incorrect code?
> 3) How should the identifiers be identified as being created by Public
> Bodies - just a prefix like "PB-"?
> OECD-DAC codelists:
> IATI Standard:
> Mark Brough
> Aid Information Advisor, Publish What You Fund
> Skype: mark-brough - Twitter: @mark_brough
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the okfn-labs