[okfn-discuss] We Need Distributed Revision/Version Control for Data

Dashamir Hoxha dashohoxha at gmail.com
Fri Jul 16 17:56:10 UTC 2010


I think that it is important to divide data formats into source code
and consumable.
For example, the source code of a document can be in LaTeX or in XML; the
consumable can be PDF or PS. The source code of the document is compiled
somehow in order to generate the consumable version.

If we make this distinction, then it is clear that it is not so much
useful to keep
in version control the binary versions of the documents. We should keep the
source code instead.

Regards,
Dashamir Hoxha

On Mon, Jul 12, 2010 at 8:35 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>
>
> On Mon, Jul 12, 2010 at 7:08 PM, Rufus Pollock <rufus.pollock at okfn.org>
> wrote:
>>
>> Today I wrote a post on distributed revision/version control for data:
>>
>>
>> http://blog.okfn.org/2010/07/12/we-need-distributed-revisionversion-control-for-data/
>>
>> I'd be very interested to hear any comments people have, or any useful
>> pointers to existing technology.
>>
>> Regards,
>>
>> Rufus
>>
> I think this is really important, but I think you are right that it requires
> domain-specific tools (and indeed I think that any data repositories will
> require domain-specific management - diffs are important but not critical).
>
> I have used normal SCM to store some of my data. My problem is that often
> updates takes lots of time even in very little has changed. This is partly
> because the data can differ in insignifiacnt ways which still require
> formal  diffs. For example if a program recalculates data the new output may
> differ in insignificant digits but these mandate that the whole data set is
> replaced with a new version.
>
> The main current version of SCM for data is that the data are actually
> stored at least once! (Many scientists store the data zero times).
>
> P.
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>
> _______________________________________________
> okfn-discuss mailing list
> okfn-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-discuss
>
>




More information about the okfn-discuss mailing list