[okfn-discuss] We Need Distributed Revision/Version Control for Data

Peter Murray-Rust pm286 at cam.ac.uk
Sat Jul 17 01:21:25 UTC 2010


On Fri, Jul 16, 2010 at 6:56 PM, Dashamir Hoxha <dashohoxha at gmail.com>wrote:

> I think that it is important to divide data formats into source code
> and consumable.
> For example, the source code of a document can be in LaTeX or in XML; the
> consumable can be PDF or PS. The source code of the document is compiled
> somehow in order to generate the consumable version.
>
> If we make this distinction, then it is clear that it is not so much
> useful to keep
> in version control the binary versions of the documents. We should keep the
> source code instead.
>
> Regards,
> Dashamir Hoxha
>
>
>
In science I tend to use the categories:
* source code (Java , C, etc.)
* documents (Word, LaTeX...) and I agree
* data (*.csv, *.netcdf, and many other bespoke formats (e.g. my own CML for
chemistry)

These all have different challenges (and all require different licensing
models). I think data is by far the hardest as we have to deal with
domain-specific semantics (e.g. I can swop the order of two atoms without
changing the abstraction). I think there are about 1000 scientific data
formats and we will have to deal with all of them separately


>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-discuss/attachments/20100717/56e22e38/attachment-0001.html>


More information about the okfn-discuss mailing list