[okfn-dev] Thinking more about datapkg

Rufus Pollock rufus.pollock at okfn.org
Mon Feb 14 13:31:29 UTC 2011


On 12 January 2011 18:28, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> Rufus and I sat down for a while over the new year to think about
> datapkg design.
>
> This email is not really a summary of that discussion, but thoughts
> that came to me after the discussion.  I think we are hoping for
> feedback.

We also had some nice scribbled diagrams -- have you got scans of these Matthew?

> One thing we discussed was the idea of the set of metadata about the
> package as a 'catalog entry'.
>
> I was playing with the idea of the catalog entry.
>
> Maybe a data package can be any collection of bytes, for which the
> only necessary criterion is: we know how to get the bytes; we know how
> to get the name.

I think this is a key point: keep things as simple as possible and
don't assume (as with software) that we are always dealing with files
(we could have an API).

> Start with an example.
>
> I've got some files in an archive named
>
> mydata-0.3.tar.gz
>
> I know how to get the bytes (because it's a tar.gz file).  The 'name'
> is 'mydata-0.3'.    In this case, the catalog entry can be compiled by
> guessing:
>
> name = mydata-0.3
>
> format = tar.gz
>
> Implied are:
>
> revision =
> version =
>
> To publish 'mydata-0.3.tar.gz', I can make this trivial catalog entry,
> or ask datapkg to make it, and then just add where I can get the data
>
> name = mydata-0.3
> format = tar.gz
> url = http://www.mydomain.org/files/mydata-0.3.tar.gz
>
> Now I just have to put this catalog entry somewhere (ckan, etc).

[...]

> That means, that there need be nothing specific about an archive, that
> makes it a data package, but, of course, I can also make the catalog
> entry be part of the archive.  That might be using (as now) a standard
> name - catalog.json or something.

Other points I remember that are important:

* Distinction between a Package and a PackageRevision - first is the
abstract thing 'PackageX' and and latter is PackageX as some
version/revision (something I can actually get).

* Use JSON for metadata and catalog file. (I've started work on
converging on json in datapkg now that 0.8 is out the door [1])

* Simple index file called catalog.json (and talked about relation
between an index of things that could be installed versus list of
things that were installed)

Rufus

[1]: https://bitbucket.org/okfn/datapkg/changeset/00ba7f1c1169




More information about the okfn-labs mailing list