[open-science] [Open Manufacturing] Fwd: [get.theinfo] ANN: datapkg (v0.5) - a tool for distributing, discovering and installing data "packages"

Wed Feb 24 08:58:13 UTC 2010

On 23 February 2010 21:13, Bryan Bishop <kanzure at gmail.com> wrote:
> On Tue, Feb 23, 2010 at 3:09 PM, John Griessen wrote:
>> Bryan Bishop wrote:
>>>
>>> ---------- Forwarded message ----------
>>> From: Rufus Pollock <rufus.pollock at okfn.org>
>>> Date: Tue, Feb 23, 2010 at 9:55 AM
>>> Subject: [get.theinfo] ANN: datapkg (v0.5) - a tool for distributing,
>>> discovering and installing data "packages"
>>
>> Know how this handles versions or version control systems?
>
> There's already some room in the spec for versions of the data
> packages, but at the moment I don't know whether or not the data
> packages themselves are git, mercurial or some other dcvs
> repositories.

Right. The basic way datapkg works is that it expects:

a) some basic metadata (name, title, version etc) -- most of this is
optional though recommended
b) a "download url" (or possibly multiple ones ...)

This is very similar (nothing accidental here!) to the way things like
the python package index work (and associated tools like easy_install
or pip). This download url, can point either to a distribution
(specially formatted tar.gz or zip) or, nowadays, just a svn, git or
mercurial repo (depending the plugins). This is a little different
from, say, Debian where you really only ever have one download url and
its guaranteed to work.

I think the more flexible method is probably a must for us given the
current very non-standardized set up for data and hence we've gone for
a simple download_url approach with extensibility via plugins (the
default is just to try retrieve the url -- plugins could alter this
by, for example, realizing the url represents a vcs and performing the
relevant checkout).

Regards,

Rufus