[okfn-labs] Versioning Data

Rufus Pollock rufus.pollock at okfn.org
Wed Jul 3 09:51:26 UTC 2013

On 2 July 2013 17:47, Marianne Bellotti <marianne.bellotti at gmail.com> wrote:
> The problem with using source code version control with data is that while source code transformations tend to be individual text edits (removed these characters and added these) data transformations tend to be functions. Not sure how useful version control will be until you can track and reproduce those kinds of actions.

That's an excellent point and something discussed in the post at some
length in the limitations and alternatives section of the post

The argument of the post, in a nutshell, is that:

- source code versioning tools may be a relatively poor fit to the
problem domain but their maturity and power outweight this disbenefit
- there are standard alternative models - that I've personally been
involved in using and building [^note] - including recording
transforms and a versioning system specifically design for data.
However, at present, no such system - that I know of :-) -- has got
widespread adoption or maturity
- that said given the poor fit of "code versioning" especially for
large data I imagine there is real opportunity for alternatives here
and I'm very excited to hear about them, use them (and contribute to
their development).
- (At the same time git and mercurial will continue to improve. You
can already use git on file sizes an order or magnitude larger than
~3-5y ago ...)


[^note]: e.g. I and colleagues implemented RDBMS-based copy-on-write
versioning back in 2007 that's seen ongoing production use. Also Data
Explorer http://explorer.okfnlabs.org/ is an experiment in a
"simplest-possible" version of recording transforms approach (JS
scripts!). There's some write up of ongoing "research" and proposals
in this area at http://www.dataprotocols.org/en/latest/syncing.html

> -Marianne
> Exversion.com
> Sent from my iPhone

More information about the okfn-labs mailing list