[wdmmg-dev] Mark's AidData

Mon Jun 20 12:04:01 UTC 2011

Hi Mark,

On Mon, Jun 20, 2011 at 12:26 PM, Mark Brough
<mark.brough at publishwhatyoufund.org> wrote:
> Great. Actually, this is good timing, as I got confused about what I was
> trying to do with normalising the countries and organisations and ended up
> with an almost infinite process. So I'm going to try and re-work most of the
> import controller. But I'm also starting to feel like me building a
> complicated relational database is maybe not really worth it to just show
> some nice pictures...
>
> Wednesday sounds good. I'm going to Budapest on Wednesday evening but most
> of the day would be fine.
>
> CSV Mapping
> I don't have the CSV mapping, I just parse the XML directly into the
> database in a massive function, you can see it here:
> https://github.com/markbrough/IATI-Data/blob/master/app/controllers/iatiregistry_controller.rb

Wow thats one impressive thing - but very nice, although it could of
course be 7 functions ;)

> I didn't try to import the IATI data into OpenSpending because I couldn't
> get existing packages to work, so I figured creating my own would be even
> less likely! But I'll have a think about:
> a) What else would need to be added to or changed in iati2csv (and your
> mapping) to make it complete and hopefully work for DFID/WB/any future IATI
> data

Basically we'd need to implement a kind of extension stage where
regions and sectors are normalized (I'm usually using Google Docs for
this kind of thing, see the attached script that runs against the
German federal budget to add classification details and colors).

> b) If there's any information in an activity which is not shared by all the
> transactions - I think there might be but not sure. And also, whether this
> matters.

I haven't seen that yet but would be very interesting if it did in
fact exist. Wouldn't be impossible to fix though (I'm currently
overwriting things like default-aid-type with aid-type when they are
specified in the transaction.

> Import CSV/XML
> I take the point about the maintenance nightmare - although at the same
> time, it would be nice for there to be some way to update reasonably easily
> from the IATI Registry as:
> a) new donors publish (should be another 7 or so by November)
> b) existing donors update their data (DFID last updated about 2 weeks ago -
> I think they do so every month).
> c) I'm thinking about building an example CSV to IATI converter - where you
> upload your aid data (e.g. Estonia/Norway/PEPFAR) and map it to IATI fields
> and it gives it back to you in IATI XML. Is that a good idea?

That's a really cool idea - I have some data from EuropeAid and I
think we looked at spain together. Given a reasonably simple CSV to
IATI importer we might be able to do some of their work for them and
thus get a nicer database and more chances to compare different
countries' efforts.

As for updating: once we have the scripts to download, normalize and
import I think its very realistic to just combine them into a shell
script or Makefile so they become a consistent pipeline. We're also
thinking about using more sophisticated ETL things or even integrating
some of this into CKAN but as far as I know nothing of this is ready
yet.

> On the other hand, I guess more manual processing does sound like it could
> be better for tidying up data before import. And there are some cool
> possibilities, like pulling in geo-coded data for each WB project (which
> isn't in their IATI data but it is normally in the Mapping for Results data)
> via this: http://api.worldbank.org/api/projects -- example:
> http://search.worldbank.org/api/projects?qterm=*:*&fl=id,location&countrycode[]=IN&format=json

Amazing :-) Re manual work I think we do want to reduce this to none
as soon as we now the precise steps that are required on any given
dataset - do it manually once and then automate.

> My errors with OS
> Re my installation of OpenSpending (on Ubuntu 11.04), looking at the Uganda
> dataset, this works fine:
> http://127.0.0.1:5000/dataset/uganda/dimension/from
> http://127.0.0.1:5000/dataset/uganda/dimension/to
>
> This gives error 500 (attached error from the paster and solr consoles):
> http://127.0.0.1:5000/dataset/uganda

This looks like solr was not available - did you set the solr url in
your .ini file?

> (I ran paster load uganda (with some new-ish but not the final data) and got
> no errors, these are the last few lines:
> 2011-06-20 10:32:46,910 INFO  [wdmmg.lib.loader] uganda loaded 11000 in
> 0.89s
> 2011-06-20 10:32:47,181 INFO  [wdmmg.lib.cubes] compute cube for dataset
> 'uganda', cube name: 'default', dimensions: 'to, from, swg,
> sector_objective, year'
> 2011-06-20 10:32:49,786 INFO  [wdmmg.lib.cubes] Done. Took: 2s
>
> I tried pater load cra with the CRA dataset as well and it looked like
> everything was going OK until I got this error:
> IOError: [Errno 2] No such file or directory:
> '/home/pwyf/env/wdmmg/pylons_data/getdata/ukgov-finances-cra/nuts1_population_2006.csv'
> )

Ah I think you may need to run the install_data script that will
download all relevant pieces of data for CRA (like the population
statistics mentioned here)

- Friedrich
-------------- next part --------------
A non-text attachment was scrubbed...
Name: expand.py
Type: application/octet-stream
Size: 3360 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20110620/27238eb4/attachment-0001.obj>