[ckan-discuss] Options to store cleaned-up data

Rufus Pollock rufus.pollock at okfn.org
Sun Oct 31 21:57:52 GMT 2010


On 31 October 2010 21:27, Tim McNamara <paperless at timmcnamara.co.nz> wrote:
> I've been working with a few government spreadsheets. Their quality is
> ...variable. I've spent some time cleaning things up, however departments
> seem to be very reluctant to updating their records with something infected
> from the outside world. Do you think that there is any scope for CKAN to
> hold blobs of data as alternative sources?

Absolutely - this was precisely one of the things we anticipated
happening as the data ecosystem got richer and it's already happening.

There are two options how to proceed in these kind of cases:

a) Add resources to the 'official' package
b) Create a 'derived' package representing your 'cleaned-up' version of the data

To give some concrete examples:

For (a):

COFOG: <http://ckan.net/package/cofog> - here the primary 'resource'
is not the link to the 'official' data but to the dataset we
(wheredoesmymoneygo) created by extracting the material and converting
to csv (original is in an access mdb file that you have to get through
an obscure link etc)

For (b):

Open Library data: Original: <http://ckan.net/package/openlibrary>,
Derived: <http://ckan.net/package/talis-openlibrary>

Regards,

Rufus



More information about the ckan-discuss mailing list