[okfn-dev] Thinking more about datapkg

Matthew Brett matthew.brett at gmail.com
Wed Jan 12 18:28:46 UTC 2011


Hi,

Rufus and I sat down for a while over the new year to think about
datapkg design.

This email is not really a summary of that discussion, but thoughts
that came to me after the discussion.  I think we are hoping for
feedback.

One thing we discussed was the idea of the set of metadata about the
package as a 'catalog entry'.

I was playing with the idea of the catalog entry.

Maybe a data package can be any collection of bytes, for which the
only necessary criterion is: we know how to get the bytes; we know how
to get the name.

Start with an example.

I've got some files in an archive named

mydata-0.3.tar.gz

I know how to get the bytes (because it's a tar.gz file).  The 'name'
is 'mydata-0.3'.    In this case, the catalog entry can be compiled by
guessing:

name = mydata-0.3
format = tar.gz

Implied are:

revision =
version =

To publish 'mydata-0.3.tar.gz', I can make this trivial catalog entry,
or ask datapkg to make it, and then just add where I can get the data

name = mydata-0.3
format = tar.gz
url = http://www.mydomain.org/files/mydata-0.3.tar.gz

Now I just have to put this catalog entry somewhere (ckan, etc).

To install the data, I can obviously ask datapkg to do it:

datapkg install mydata

sort of thing.

Or I can do this:

wget http://www.mydomain.org/files/mydata-0.3.tar.gz
tar zvvf mydata-0.3.tar.gz
cat >> .datapkg/installed.catalogue << EOF
[mydata-0.3]
format = local
path = /path/where/unpacked
EOF

kind of thing.

That means, that there need be nothing specific about an archive, that
makes it a data package, but, of course, I can also make the catalog
entry be part of the archive.  That might be using (as now) a standard
name - catalog.json or something.

Anyway - sorry - these thoughts still not entirely formed, but I
wanted to put them down before they faded,

See you,

Matthew




More information about the okfn-labs mailing list