[open-science] OKF tools: ckan.org, thedatahub.org

Rufus Pollock rufus.pollock at okfn.org
Tue Apr 3 19:34:05 UTC 2012


On 2 April 2012 23:21, Tom Roche <Tom_Roche at pobox.com> wrote:
>
> Jessy Kate Schingler Mon, 2 Apr 2012 11:23:52 -0700 (rearranged)
>> is there a reason people find ckan/thedatahub insufficient for data
>> management needs?

Thanks for bringing this up Jessy -- I've been meaning to post to the
list about CKAN and DataHub for a while. I've try and pull together a
really proper summary asap. But in the mean time to answer specific
queries.

> You're presupposing folks even know about these tools. (Never overlook
> ignorance as a cause of behavior :-) Before your post I hadn't seen
> either. I'm still generally ignorant regarding the OKF--being US-based

That's definitely not your fault :-) -- we (being the Open Knowledge
Foundation CKAN / DataHub project) have done a pretty poor job telling
people about these tools :-)

> probably doesn't help, but mostly I'm head-down in my work, coming up
> occasionally to "scratch itches," like, why am I cut'n' pasting so
> @#$%^&! much? (A: because I can't send OP links to content on my
> current wiki, because its firewall's admins are such a PITA.) What I
> know, and hence seems useful to me in this domain, are sites/tools
> like github, google, sourceforge (etc) which I've used.

Exactly.

>> is it related to technical/features, or to peoples' familiarity and
>> confidence around the longevity of the site?
>
> Regarding CKAN, it's probably
>
> http://ckan.org/solutions/pricing/
>>> FREE[:] Deploy your own community instance hosted on your servers.

Right, but just below is a section about http://theDataHub.org/ and
using it to host your datasets for you right now!

> If I had my own servers, my life would be rather different. I might be
> more productive, but I don't have time to be an admin (famous last
> words). I would also hafta eat pet food to afford
>
>>> CKAN Catalogue[:] from $400 / €300 a month
>
> (which might also degrade my productivity, though maybe not :-)
>
>> i'm starting to learn about [thedatahub.org], it seems rather
>> perfect for data set management, and even has a change lists for
>> data sets, groups, user pages, etc.
>
> I'm unclear on the relationship between thedatahub.org and CKAN:
> instance to framework? If so, what is thedatahub.org's data store, and

### Relation of theDataHub.org and CKAN

CKAN = software
theDataHub.org = Site (open to all) powered by CKAN

You probably care more about theDataHub than CKAN -- but if you were
journal or research group wanting to manage your own data then you
might be more interested in CKAN.

More info: thedatahub.org is the community data hub the Open Knowledge
Foundation run powered by CKAN (CKAN was in fact originally developed
just to power theDataHub.org but other people have found the software
useful hence that site!).

Simply put:

CKAN = Open Source Data Hub / Data Management System (DMS) Software
DataHub = Data Hub Site/Service - place where you host datasets and
work with data

Analogy with GitHub: DataHub is like GitHub and CKAN is like the
software that runs GitHub (unfortunately *not* opensource)

### What's the DataStore

Two types of things you usually want to store:

* Files (as blogs) -- be they data dumps, images, fMRI scans etc
* Structured data (i.e. data store in a database where you can access
individuals cells and rows)

DataStore is a feature of CKAN (and hence of the DataHub) that is does
the latter. From the docs [1]: "The CKAN [DataHub] DataStore provides
a database for structured storage of data together with a powerful
Web-accesible Data API, all seamlessly integrated into the CKAN
[DataHub] interface and authorization system."

[1]: <http://docs.ckan.org/en/latest/datastore.html> (there's also a
nice short slideshow)

The former feature is provided by the DataHub's (CKAN's) FileStore
where you can store whole files. The FileStore and DataStore
interconnect, if you upload a CSV file it will automatically get
imported into a DataStore table and become available via the web data
API. See more here:

http://ckan.org/2012/03/27/ckan-datastore-and-data-api/

> how is that priced?
>
> I suspect these tools would be quite attractive to folks with my needs
> but more $. Unfortunately, being presently near bottom of both the
> academic foodchain and the income distribution drives many decisions

DataHub is currently completely free (as in beer -- and is also free
as in freedom!). At some point we may need to introduce some kind of
charging model or run a funding drive to ensure sustainability [1] but
right now it costs absolutely nothing and we hope that for data (and
esp open data) under a certain size it will always remain so!

Rufus

[1]: running a site where you store and serve lots of data costs
money, plus we want to add new features :-)




More information about the open-science mailing list