This is an idea that I've been thinking about for a while. I discussed it
with Rufus a week or so ago and wanted to share it with the list to see
what everyone thinks.

The short version: could public bodies be used to generate usable
organisation identifiers?

Pasted below, but also submitted as an issue here:

Look forward to hearing your thoughts...


The IATI Standard is an XML based format for sharing detailed information
about aid projects. Fundamentally, the model shows resource flows from one
organisation to another, with various classifications in between and many
financial transactions as part of each project. So like this:

activity (DFID -> World Health Organisation)
  - transaction (GBP 500 disbursed on 2013-05-01)
  - transaction (GBP 500 disbursed on 2013-07-05)

For the private sector and NGOs, the methodology for uniquely identifying
organisations is:

[Jurisdiction]-[National registration body]-[Number]
e.g. for Oxfam GB, registered at the Charity Commission, with reg number

For governments, the following methodology is used:
[Jurisdiction]-[OECD/DAC Agency code]
e.g. for the UK's Department for International Development:

For multilaterals, we use the following methodology:
[OECD/DAC Channel code]
e.g. for the World Bank's International Development Association (IDA):

Agency codes
* Agency codes only include donor agencies. So the Ministry of Finance in
Botswana, for example, does not have a code.
* Agency codes don't even include all donor agencies: for example, parts of
the European Commission or the United States, even though they give aid,
don't have their own identifier - they're categorised under "Miscellaneous".
* The process for adding new agency codes is slow (even if it took a day,
that might be too long)

Channel codes
* Channel codes only contain a subset of all of the multilateral /
international / intergovernmental organisations in the world, and many of
them are not listed in a very usable way. For example, the World Health
Organisation has two codes:
a) World Health Organisation - core voluntary contributions account
b) World Health Organisation - assessed contributions
--> but there isn't one for just "World Health Organisation", for example
if you're contracting them to deliver a project.

Many organisations publishing IATI data will therefore struggle to provide
unique organisation identifiers for many of the public sector /
international organisations that they are working with

* Official lists of organisations should be used if possible.
* Official lists of organisations don't exist in most cases.
* The exact identifier assigned to an organisation is not fundamentally
important (whether it's BW-1 or BW-21, the Botswana Ministry of Finance
just needs a code).
* Organisation identifiers should be cross-mapped to other codes /
identifiers for those organisations so that the data is easily

Fuzzy reconciliation / text matching of organisations, with an API that
assigns an existing identifier where available, and creates a new one where
it's not available

1) Organisations (initially, preferably those with a large amount of data)
throw four key pieces of data at the API:
* organisation name (text) - e.g. MINISTRY OF FINANCE
* organisation country (code) - e.g. BW (for Botswana)
* language (code) - e.g. en
* last recorded transaction with this organisation (date) - e.g. 2013-07-05

2) the API responds with one of the following (possibly using HTTP status
a) Organisation found => use code "BW-1"
b) Organisation not found => created code "BW-21"

   it also stores the data about the last recorded transaction, so that
other people know that that organisation *may have* existed on that date.

Another source could be Charts of Accounts, existing lists (like those that
exist on PB already), budget documents, and structured spending data, e.g.
from OpenSpending.

Dealing with duplicates:
This will probably lead to some duplicates being created. There could be
some manual reconciliation for this. Organisations could have a primary
identifier and several secondary identifiers that were used by duplicate

Dealing with changing organisations:
Organisations can be created / deleted / merged in the real world. This
should probably lead to:
a) created - a new identifier gets created;
b) merged - a new identifier gets created for the new organisation; and
(manually) the old organisations are linked / related to the new
c) deleted - the identifier continues to exist, because old (and possibly
future) data will still refer to it. However, it should be (manually)
marked as no longer existing, pointing to a successor organisation of one
exists (with some flag to explain whether it's a wholly .

1) Does this sound sensible? Is it a good idea? Is there a better
2) Will the fuzzy matching be accurate enough to be useful? Is it likely to
assign organisations an incorrect code?
3) How should the identifiers be identified as being created by Public
Bodies - just a prefix like "PB-"?

OECD-DAC codelists:
IATI Standard:

