[okfn-discuss] Datahub and CKAN?

Rufus Pollock rufus.pollock at okfn.org
Fri Mar 6 17:10:01 UTC 2009


2009/3/4 Lukasz Szybalski <szybalski at gmail.com>:
> Hello,
> As some of you might know I run a project called datahub.
>
> "Datahub is a tool that allows faster download/crawl, parse, load, and
> visualize of data. It achieves this by allowing you to divide each
> step into its own work folders. In each work folder you get a sample
> files that you can start coding."
>
> http://lucasmanual.com/mywiki/DataHub

Sounds nice and like it might have something in come with both data
packages/bundles in our Open Economics project and our datapkg
utility.

For example here's the 'data bundle' for the Millenium Development Goals:

<http://knowledgeforge.net/econ/hg/file/d1275e3592b1/econdata/mdg/>
<http://knowledgeforge.net/econ/hg/file/d1275e3592b1/econdata/mdg/data.py>

The data.py file has code for getting the data, parsing it etc etc.

Here's datapkg: <http://www.okfn.org/datapkg/>

Among other things datapkg has a create command for creating a basic
set of 'package' files on disk (see the $ datapkg man command for more
info).

> There were some discussion in collaboration of ckan and datahub. The
> main goal as I see datahub right now is to create tools for getting,
> parsing, manipulating and possibly visualizing data.  If every project
> that is listed here: http://www.ckan.net/package/list had a
> corresponding package that I could download, run some command which
> would get the data, run another command to parse and load the data,
> then data mining would allow us to do so much more without the
> overhead of getting,parsing and loading the data.

We share a similar dream :) CKAN has a nice REST API:

  <http://www.ckan.net/api/rest/>

And there's a python implementation that talks to this:

  <http://project.knowledgeforge.net/ckan/svn/ckanclient/trunk/>

datapkg also has facilities for talking to CKAN in order to register
and download material so these are in a fairly alpha state (see $
datapkg man).

However, as should be clear from browsing around CKAN not all packages
there have a 'download url' and when they do it isn't usually
something that packaged (usually just a tar.gz or the like). That said
I definitely think things should move in the direction you suggest.

In fact there have been discussions here for a while of the idea of
have 'data package maintainers' a la Debian who maintain CKAN packages
and do the job of converting the raw material into something a more
standardized form (in the way that Debian maintainers 'package' up the
underlying software libraries and applications).

Regards,

Rufus




More information about the okfn-discuss mailing list