[okfn-discuss] ANN: datapkg (v0.5) - a tool for distributing, discovering and installing data "packages"

Rufus Pollock rufus.pollock at okfn.org
Tue Feb 23 15:54:25 UTC 2010


Datapkg 0.5 has been released:
<http://blog.okfn.org/2010/02/23/introducing-datapkg/>

This is the first release deemed suitable for public consumption
(though still alpha)! This announce therefore serves both an
introduction and release announcement.


## Introduction

datapkg is an user tool for distributing, discovering and installing
data (and content) 'packages'.

datapkg is a simple way to 'package' data building on existing
packaging tools developed for code (e.g. Debian apt, PyPI, CRAN, Gems,
CPAN). datapkg is designed to integrate closely with the CKAN
(Comprehensive Knowledge Archive Network).

In terms of the big picture, datapkg is the "apt-get/aptitude/dpkg"
part of the vision for a 'Debian of Data' (i.e. scalable, distributed,
open data infrastructures! -- for more see [this post][comp-post] or
[these recent slides][ccc-slides]):

  <http://m.okfn.org/files/talks/media/debian_of_data.png>

[comp-post]: http://blog.okfn.org/2007/04/30/what-do-we-mean-by-componentization-for-knowledge/
[ccc-slides]: http://m.okfn.org/files/talks/ccc_20091228/

Datapkg is a key part of making data sharing **automatable**. As an
end-user tool it allows automated (command-line or scripted)
discovery, installation and sharing of data "packages" either
standalone or via interaction with a registry like CKAN.


## Trying it out

If you're interested in giving it a spin here's the install instructions:

  <http://knowledgeforge.net/ckan/doc/datapkg/install.html>

Once you've got it running you can then do things like (for more see
the docs: http://knowledgeforge.net/ckan/doc/datapkg/):

> Search for a package in an Index e.g. on CKAN.net::
>
>     # let's search for iso country/language codes data (iso 3166 ...)
>     $ datapkg search ckan:// iso
>     ...
>     iso-3166-2-data -- Linked ISO 3166-2 Data
>     ...
>
> Get some information about one of them (in this case 2-digit ISO country codes in RDF)::
>
>     $ datapkg info ckan://iso-3166-2-data
>     ....
>     ....
>
> Let's install it (to the current directory)::
>
>     $ datapkg install ckan://iso-3166-2-data .
>
> This will download the Package 'iso-3166-2-data' together with its "Resources" and unpack it into a directory named 'iso-3166-2-data'.


## Extending

datapkg is intended to be a generic tool for data packaging. As such,
we want it to deal with as many "distribution" formats and as many
different registries as possible. We've therefore designed datapkg to
be extensible so that it can easily be adapted to talk with other
systems. What kinds of plugins might one write?

  * A plugin to discover data "packages" from RDFa information in
web-pages, especially those in Government data catalogues (suggested
by Ed Summers <http://inkdroid.org/journal/about/>
  * A plugin to Ensembl <http://www.ensembl.org/>
  * A plugin to extract download urls or SPARQL endpoints from VoID
descriptions (suggested by Richard Cynganiak
<http://dowhatimean.net/>)

**We're looking for more such suggestions as well as for people who'd
like to implement plugins.** If you're interested please get in touch:
<http://www.okfn.org/contact/>




More information about the okfn-discuss mailing list