[okfn-discuss] We Need Distributed Revision/Version Control for Data

Rufus Pollock rufus.pollock at okfn.org
Mon Jul 12 18:08:23 UTC 2010


Today I wrote a post on distributed revision/version control for data:

http://blog.okfn.org/2010/07/12/we-need-distributed-revisionversion-control-for-data/

I'd be very interested to hear any comments people have, or any useful
pointers to existing technology.

Regards,

Rufus

<excerpt>
In the open data community, we need tools for doing distributed
revision/version control for data like the one’s that already exist
for code.

(Don’t know what I mean by revision control or distributed revision
control? Read this)

Distributed revision control systems for code, like mercurial and git,
have had a massive impact on software development, and especially so
in the F/OSS community — the distributed methodology works
particularly well with open material.

The same would be true for data. Revision control, and specifically
distributed revision control, would support (cf this and this earlier
post):

* Incremental development: “patches”, changelogs etc
* Provenance tracking: showing who did what, when is built in to a
revisioning system
* Broader participation: you don’t have to worry (as much) about who
you let in because changes can be reverted. It’s also easier to get
involved because you can have your own independent copy to play around
with (Distributed).
* Easier collaboration: updates don’t mean making a full copy (and
applying updates is automatic), you can see who is making changes,
when etc etc
* Peer-2-peer model: different contributors can work simultaneously
and independently (Distributed). Extra “features” can added
independently of mainline development with re-integration later
(Distributed).

Because this is all a bit abstract it is worth giving a concrete
example of why “distributed” revision control could be so useful.

...
</excerpt>




More information about the okfn-discuss mailing list