[okfn-discuss] Datahub and CKAN?

Rufus Pollock rufus.pollock at okfn.org
Fri Mar 6 17:10:01 UTC 2009

2009/3/4 Lukasz Szybalski <szybalski at gmail.com>:
> Hello,
> As some of you might know I run a project called datahub.
> "Datahub is a tool that allows faster download/crawl, parse, load, and
> visualize of data. It achieves this by allowing you to divide each
> step into its own work folders. In each work folder you get a sample
> files that you can start coding."
> http://lucasmanual.com/mywiki/DataHub

Sounds nice and like it might have something in come with both data
packages/bundles in our Open Economics project and our datapkg

For example here's the 'data bundle' for the Millenium Development Goals:


The data.py file has code for getting the data, parsing it etc etc.

Here's datapkg: <http://www.okfn.org/datapkg/>

Among other things datapkg has a create command for creating a basic
set of 'package' files on disk (see the $ datapkg man command for more

> There were some discussion in collaboration of ckan and datahub. The
> main goal as I see datahub right now is to create tools for getting,
> parsing, manipulating and possibly visualizing data.  If every project
> that is listed here: http://www.ckan.net/package/list had a
> corresponding package that I could download, run some command which
> would get the data, run another command to parse and load the data,
> then data mining would allow us to do so much more without the
> overhead of getting,parsing and loading the data.

We share a similar dream :) CKAN has a nice REST API:


And there's a python implementation that talks to this:


datapkg also has facilities for talking to CKAN in order to register
and download material so these are in a fairly alpha state (see $
datapkg man).

However, as should be clear from browsing around CKAN not all packages
there have a 'download url' and when they do it isn't usually
something that packaged (usually just a tar.gz or the like). That said
I definitely think things should move in the direction you suggest.

In fact there have been discussions here for a while of the idea of
have 'data package maintainers' a la Debian who maintain CKAN packages
and do the job of converting the raw material into something a more
standardized form (in the way that Debian maintainers 'package' up the
underlying software libraries and applications).



More information about the okfn-discuss mailing list