[okfn-labs] Fw: EU Transparency Register

Friedrich Lindenberg friedrich.lindenberg at okfn.org
Wed Feb 29 20:25:50 UTC 2012


Re-sending as list was borked.  


Forwarded message:

> From: Friedrich Lindenberg <friedrich.lindenberg at okfn.org>
> To: Chris Taggart <countculture at gmail.com>
> Cc: okfn-labs at lists.okfn.org
> Date: Monday, February 20, 2012 12:31:05 PM
> Subject: Re: EU Transparency Register
> 
> Hey Chris (and Labs!),
> 
> On Sun, Feb 19, 2012 at 3:07 PM, Chris Taggart <countculture at gmail.com (mailto:countculture at gmail.com)> wrote:
> > I've actually done an importer for the EU transparency register. Works OK,
> > though obviously can only reconcile for those territories we have data for
> > (should have Belgium soon).
> > 
> > You should also be aware of the EU ISA Core Business Vocabulary[1], which
> > was published late Friday. I'll be blogging about it hopefully Monday. As
> > part of doing this (which I was heavily involved with) we looked at company
> > types. As the document makes clear, normalising company types is not a
> > trivial problem, and even when company types have the same name they are not
> > necessarily the same thing.
> > 
> > However I think XBRL Europe is doing some work here, and it's a medium term
> > goal for OpenCorporates to solve this (if no-one else does).
> > 
> > Re your list, the only thing I'd say is that the legal status field can't be
> > normalised. This is in part because it is a  manually filled in field, in
> > part because it isn't a canonical source (it's not uncommon for UK PLCs to
> > switch to Ltd and back again), and in part because the field at not mutually
> > exclusive (e.g. non-profits can also be private companies).
> > 
> 
> 
> I'm playing this game manually at the moment, and the current state is this:
> 
> Academic Institution 20
> Association (other) 181
> Charity 18
> Cooperative 13
> Foundation 39
> Natural Person 51
> No legal form 61
> Non-profit Association 572
> Non-profit Company 62
> Partnership 3
> Private Company 202
> Public Body 31
> Public Company 155
> Trade Unions and Professional Associations 112
> (blank) 803
> 
> Of course that's a terribly oversimplification in most cases, and
> having around 500 or so entries for which there is no useful
> information (my favorite entry so far was "Legal Status: good.").
> 
> My ultimate goal in this is very modest: I want a rough way to cluster
> these organisations in the user interface (with a caveat), not
> necessarily offer a perfectly useful taxonomy.
> 
> > So, it's difficult. Obviously.
> > 
> > The importer I've done 1) extracts the name and tries to reconcile to a
> > company 2) extracts other data, including phone number etc and register ID.
> > This register ID will allow users to query OpenCorporates by asking (what
> > data/companies with register number XXXX and register 'eufts'.
> > 
> 
> 
> Just to clarify: FTS is the Financial Transparency System, no? This is
> about commission expenditure, as opposed to the Transparency Register,
> which is about people.
> 
> > So, not sure that helps.
> > 
> > The main thing I'd say is that fundamentally normalising this is a
> > multi-stage process.
> > 1) extract the raw organisation type (useful to search against as a raw text
> > field)
> > 2) try to reconcile the entity
> > 3) get the canonical org type from the reconcile entry
> > 
> 
> 
> Hm, so far I have 394 matched, 481 unmatched and 23506 with not enough
> information (i.e. natural persons, countries outside of OC and civil
> society orgs). So there's quite a bit to do, still.
> 
> > For what it's worth, we're in a similar though process with company
> > officers. We're doing some normalising (correcting spelling mistakes, common
> > abbreviations) before indexing in solr, but not yes trying to normalise
> > president (say) to chairman.
> > 
> > By the way, Public/Private company is not a bad split, but there are
> > companies (e.g. in Australia that fall into neither!).
> > Chris
> > 
> 
> 
> I now have some additional categories, but I do consider to merge some
> of them (e.g. partnership into private company).
> 
> Thanks for your help!
> 
> - Friedrich
> 
> 
> > [1] https://joinup.ec.europa.eu/asset/core_business/release/02
> > 
> > On 19 February 2012 13:25, Friedrich Lindenberg
> > <friedrich.lindenberg at okfn.org (mailto:friedrich.lindenberg at okfn.org)> wrote:
> > > 
> > > Hi Chris,
> > > 
> > > I'm now digging into the EU transparency register in some more detail,
> > > which turns out to be really interesting. Here [1] is an export of all
> > > the legalStatus fields set for representatives. I assume you've seen
> > > much worse. Eventually, I'd like to end up with a two-level construct
> > > here: a cleansed version of this field (How many ways are there to say
> > > AISBL?) and a rough taxonomy by which to classify these. Its the
> > > second part I need your help for: this should be relatively
> > > human-readable, yet not false. It will not only cover interest
> > > representatives, but also their clients. Here's my first, very naive!,
> > > go - maybe you have a minute for some feedback:
> > > 
> > > Natural person
> > > Private company
> > > Public company
> > > Cooperative company
> > > Public body
> > > Non-registered association
> > > Association (generic)
> > > Association (trade union)
> > > Association (non-profit)
> > > Association (other charity)
> > > Association (foundation)
> > > 
> > > Obviously this has more than one dimension and is a complete mess.
> > > 
> > > Hope you have a great time at NICAR!
> > > 
> > > - Friedrich
> > > 
> > > [1]
> > > https://github.com/pudo/lobbytransparency/blob/master/etl/legalStatus.csv
> > > 
> > 
> > 
> > 
> > 
> > 
> > --
> > -------------------------------------------------------
> > OpenCorporates :: The Open Database of the Corporate World
> > http://opencorporates.com
> > OpenlyLocal :: Making Local Government More Transparent
> > http://openlylocal.com
> > Blog: http://countculture.wordpress.com
> > Twitter: http://twitter.com/CountCulture
> > 
> 
> 
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20120229/3da6f807/attachment-0001.html>


More information about the okfn-labs mailing list