[okfn-dev] Thinking more about datapkg

Rufus Pollock rufus.pollock at okfn.org
Mon Feb 14 13:31:29 UTC 2011

On 12 January 2011 18:28, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
> Rufus and I sat down for a while over the new year to think about
> datapkg design.
> This email is not really a summary of that discussion, but thoughts
> that came to me after the discussion.  I think we are hoping for
> feedback.

We also had some nice scribbled diagrams -- have you got scans of these Matthew?

> One thing we discussed was the idea of the set of metadata about the
> package as a 'catalog entry'.
> I was playing with the idea of the catalog entry.
> Maybe a data package can be any collection of bytes, for which the
> only necessary criterion is: we know how to get the bytes; we know how
> to get the name.

I think this is a key point: keep things as simple as possible and
don't assume (as with software) that we are always dealing with files
(we could have an API).

> Start with an example.
> I've got some files in an archive named
> mydata-0.3.tar.gz
> I know how to get the bytes (because it's a tar.gz file).  The 'name'
> is 'mydata-0.3'.    In this case, the catalog entry can be compiled by
> guessing:
> name = mydata-0.3
> format = tar.gz
> Implied are:
> revision =
> version =
> To publish 'mydata-0.3.tar.gz', I can make this trivial catalog entry,
> or ask datapkg to make it, and then just add where I can get the data
> name = mydata-0.3
> format = tar.gz
> url = http://www.mydomain.org/files/mydata-0.3.tar.gz
> Now I just have to put this catalog entry somewhere (ckan, etc).


> That means, that there need be nothing specific about an archive, that
> makes it a data package, but, of course, I can also make the catalog
> entry be part of the archive.  That might be using (as now) a standard
> name - catalog.json or something.

Other points I remember that are important:

* Distinction between a Package and a PackageRevision - first is the
abstract thing 'PackageX' and and latter is PackageX as some
version/revision (something I can actually get).

* Use JSON for metadata and catalog file. (I've started work on
converging on json in datapkg now that 0.8 is out the door [1])

* Simple index file called catalog.json (and talked about relation
between an index of things that could be installed versus list of
things that were installed)


[1]: https://bitbucket.org/okfn/datapkg/changeset/00ba7f1c1169

More information about the okfn-labs mailing list