[ckan-discuss] ANN: datapkg (v0.5) - a tool for distributing, discovering and installing data "packages"

Rufus Pollock rufus.pollock at okfn.org
Thu Feb 25 23:07:38 GMT 2010


Dear Ed, Richard and other interested parties,

You may have seen the datapkg v0.5 announcement the other day which is
forwarded below (and apologies for cross-posting if you have), but I
wanted to particularly highlight the mention at the end of that
announce about how extensions might work (I hope I have summarized
your suggestions correctly):

<quote>
datapkg is intended to be a generic tool for data packaging. As such,
we want it to deal with as many "distribution" formats and as many
different registries as possible. We've therefore designed datapkg to
be extensible so that it can easily be adapted to talk with other
systems. What kinds of plugins might one write?

 * A plugin to discover data "packages" from RDFa information in
web-pages, especially those in Government data catalogues (suggested
by Ed Summers <http://inkdroid.org/journal/about/>
 * A plugin to extract download urls or SPARQL endpoints from VoID
descriptions (suggested by Richard Cynganiak
<http://dowhatimean.net/>)
</quote>

I was wondering how we might go about implementing one of these and,
in the process, defining what are the minimal informational "hooks"
needed for a data package tool to work (e.g. download url, sparql
endpoint)

Regards,

Rufus

---------- Forwarded message ----------
From: Rufus Pollock <rufus.pollock at okfn.org>
Date: 23 February 2010 15:54
Subject: ANN: datapkg (v0.5) - a tool for distributing, discovering
and installing data "packages"
To: okfn-discuss <okfn-discuss at lists.okfn.org>


Datapkg 0.5 has been released:
<http://blog.okfn.org/2010/02/23/introducing-datapkg/>

This is the first release deemed suitable for public consumption
(though still alpha)! This announce therefore serves both an
introduction and release announcement.


## Introduction

datapkg is an user tool for distributing, discovering and installing
data (and content) 'packages'.

datapkg is a simple way to 'package' data building on existing
packaging tools developed for code (e.g. Debian apt, PyPI, CRAN, Gems,
CPAN). datapkg is designed to integrate closely with the CKAN
(Comprehensive Knowledge Archive Network).

In terms of the big picture, datapkg is the "apt-get/aptitude/dpkg"
part of the vision for a 'Debian of Data' (i.e. scalable, distributed,
open data infrastructures! -- for more see [this post][comp-post] or
[these recent slides][ccc-slides]):

 <http://m.okfn.org/files/talks/media/debian_of_data.png>

[comp-post]: http://blog.okfn.org/2007/04/30/what-do-we-mean-by-componentization-for-knowledge/
[ccc-slides]: http://m.okfn.org/files/talks/ccc_20091228/

Datapkg is a key part of making data sharing **automatable**. As an
end-user tool it allows automated (command-line or scripted)
discovery, installation and sharing of data "packages" either
standalone or via interaction with a registry like CKAN.


## Trying it out

If you're interested in giving it a spin here's the install instructions:

 <http://knowledgeforge.net/ckan/doc/datapkg/install.html>

Once you've got it running you can then do things like (for more see
the docs: http://knowledgeforge.net/ckan/doc/datapkg/):

> Search for a package in an Index e.g. on CKAN.net::
>
>     # let's search for iso country/language codes data (iso 3166 ...)
>     $ datapkg search ckan:// iso
>     ...
>     iso-3166-2-data -- Linked ISO 3166-2 Data
>     ...
>
> Get some information about one of them (in this case 2-digit ISO country codes in RDF)::
>
>     $ datapkg info ckan://iso-3166-2-data
>     ....
>     ....
>
> Let's install it (to the current directory)::
>
>     $ datapkg install ckan://iso-3166-2-data .
>
> This will download the Package 'iso-3166-2-data' together with its "Resources" and unpack it into a directory named 'iso-3166-2-data'.


## Extending

datapkg is intended to be a generic tool for data packaging. As such,
we want it to deal with as many "distribution" formats and as many
different registries as possible. We've therefore designed datapkg to
be extensible so that it can easily be adapted to talk with other
systems. What kinds of plugins might one write?

 * A plugin to discover data "packages" from RDFa information in
web-pages, especially those in Government data catalogues (suggested
by Ed Summers <http://inkdroid.org/journal/about/>
 * A plugin to Ensembl <http://www.ensembl.org/>
 * A plugin to extract download urls or SPARQL endpoints from VoID
descriptions (suggested by Richard Cynganiak
<http://dowhatimean.net/>)

**We're looking for more such suggestions as well as for people who'd
like to implement plugins.** If you're interested please get in touch:
<http://www.okfn.org/contact/>



More information about the ckan-discuss mailing list