[ckan-dev] "Data Package" specification page

Friedrich Lindenberg friedrich.lindenberg at okfn.org
Mon Jul 4 17:19:30 UTC 2011


Hi Matthew,

On Mon, Jul 4, 2011 at 6:45 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> I noticed this statement:
>
> "Data packages are nothing but metadata"
>
> Could you clarify what you mean?

This is probably an oversimplification but part of a larger
discussion: when we're talking about a data package - do we mean the
sum of all the referenced data or just the references. There are
several ways in which this could be answered:

1) Include all data, even if its TBs of stuff (e.g. scientific data) -
most useful model but also raises issues such as: when a referenced
resource changes, how do we know about this?

2) Consider just core metadata or core metadata and
processing/provenance/status/quality metadata.

3) Distinguish between reasonably and unreasonably sized resources,
mark them. This is then analogous to the various
pass-by-reference/pass-by-value discussions we have in computing
generally.

4) Inline smaller resources into the metadata catalogue (repository, then)

I'm not really sure here, but coming from a practical point of view
I'd like to have as much management of my resources as I can get
without having to switch all my other tools and practices. In other
words: CKAN should help me describe what I do, not dictate how I do
it.

> You are proposing (I think) a DVCS frontend to data, as a CLI, where
> the history is stored in an upstream server.  Would this differ from
> standardizing to SVN?  I'm not proposing that, I just wanted to get an
> idea of where you differ...

You don't really need SVN if you're talking about the metadata only -
its just a single object which can be managed more specifically and
through a nicer, RESTful interface. Once you do start to include the
data, you want VCS - and I've been spending quite a bit of time
wondering if we shouldn't make CKAN package pages into HG repos the
same way e.g. bitbucket.org pages are (this is trivial for HG, git
should also be possible).

- Friedrich

-- 
Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/

http://twitter.com/pudo
http://pudo.org




More information about the ckan-dev mailing list