[okfn-labs] Versioning Data

Marianne Bellotti marianne.bellotti at gmail.com
Tue Jul 2 16:47:43 UTC 2013


The problem with using source code version control with data is that while source code transformations tend to be individual text edits (removed these characters and added these) data transformations tend to be functions. Not sure how useful version control will be until you can track and reproduce those kinds of actions.

-Marianne
Exversion.com

Sent from my iPhone

On Jul 2, 2013, at 12:07 PM, okfn-labs-request at lists.okfn.org wrote:

> Send okfn-labs mailing list submissions to
>    okfn-labs at lists.okfn.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>    http://lists.okfn.org/mailman/listinfo/okfn-labs
> or, via email, send a message with subject or body 'help' to
>    okfn-labs-request at lists.okfn.org
> 
> You can reach the person managing the list at
>    okfn-labs-owner at lists.okfn.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of okfn-labs digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: dat, a new open data project by Max Ogden (Rufus Pollock)
>   2. Using Git (and Github) for Data - a Data Pattern (Rufus Pollock)
>   3. Re: Using Git (and Github) for Data - a Data Pattern
>      (Pieter Colpaert)
>   4. Re: Using Git (and Github) for Data - a Data Pattern
>      (Rufus Pollock)
>   5. Re: Using Git (and Github) for Data - a Data Pattern
>      (Pieter Colpaert)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Tue, 2 Jul 2013 14:49:37 +0100
> From: Rufus Pollock <rufus.pollock at okfn.org>
> Subject: Re: [okfn-labs] dat, a new open data project by Max Ogden
> To: "todd.d.robbins at gmail.com" <todd.d.robbins at gmail.com>,    Max Ogden
>    <max at maxogden.com>
> Cc: okfn-labs <okfn-labs at lists.okfn.org>
> Message-ID:
>    <CAKssCpNYHmJ6=EBAQe0hzf4NGH3Z+PEyJVejgxLDwU=1Kmh9UQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Adding Max himself into the thread ...
> 
> On 1 July 2013 22:20, todd.d.robbins at gmail.com <todd.d.robbins at gmail.com> wrote:
>> https://github.com/maxogden/dat
> 
> Thanks for sending this through - I'd also been meaning to ping the
> list about this too (Max had pinged data protocols about this)
> 
> I think the project looks very exciting and recommend in the first
> instance folks having a  look at the README and the issues:
> 
> <https://github.com/maxogden/dat#dat>
> <https://github.com/maxogden/dat/issues>
> 
> Among other things the project will implement support for the "SLEEP"
> protocol (see http://www.dataprotocols.org/en/latest/sleep.html) with
> local storage in leveldb (if i have understood the docs right!).
> 
> Rufus
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Tue, 2 Jul 2013 15:12:07 +0100
> From: Rufus Pollock <rufus.pollock at okfn.org>
> Subject: [okfn-labs] Using Git (and Github) for Data - a Data Pattern
> To: okfn-labs <okfn-labs at lists.okfn.org>
> Message-ID:
>    <CAKssCpMLO1OiJNkPkSNOd=Hb7P_TBVH9s0z-jMVTRkXkAZeirQ at mail.gmail.com>
> Content-Type: text/plain; charset=windows-1252
> 
> Hi folks,
> 
> I wanted to give a heads up on a new post I've put out today under the
> title of "Git (and Github) for data":
> 
> <http://blog.okfn.org/2013/07/02/git-and-github-for-data/>
> 
> <excerpt>
> The ability to do ?version control? for data is a big deal. There are
> various options but one of the most attractive is to reuse existing
> tools for doing this with code, like git and mercurial. This post
> describes a simple ?data pattern? for storing and versioning data
> using those tools which we?ve been using for some time and found to be
> very effective.
> </excerpt>
> 
> The basic pattern is very simple and probably familiar to lots of folks here:
> 
> 1. Storing data as line-oriented text and specifically as CSV files.
> ?Line oriented text? just indicates that individual units of the data
> such as a row of a table (or an individual cell) corresponds to one
> line.
> 
> 2. Use best of breed (code) versioning like git mercurial to store and
> manage the data.
> 
> As people know, this is exactly the model in use for a while with
> <https://github.com/datasets> and http://data.okfn.org/
> 
> Regards,
> 
> Rufus
> 
> PS: I should also add that the appearance of this post at the same
> time as Max Ogden's recent Dat efforts is entirely fortuitous
> coincidence - the original draft of this post was done a while ago
> (polishing for actual publication always gets put off!), though I have
> sought to add in some additional links in light of recent
> developments!
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Tue, 2 Jul 2013 16:20:59 +0200
> From: Pieter Colpaert <pieter.colpaert at okfn.org>
> Subject: Re: [okfn-labs] Using Git (and Github) for Data - a Data
>    Pattern
> To: okfn-labs <okfn-labs at lists.okfn.org>
> Cc: swtf at elis.ugent.be
> Message-ID:
>    <CAP5Wo11xOPUx9bL3D94m3-oiR5yuhwttJ6MQ5PsA1NSKnXS=Bw at mail.gmail.com>
> Content-Type: text/plain; charset="windows-1252"
> 
> Hi Rufus,
> 
> You might also like the git for triples we're building in our lab at the
> university of Ghent: http://rawbase.github.io
> 
> Paper from WWW13 can be found here:
> https://t.co/FxedWgR13Y
> 
> Kind regards,
> 
> Pieter
> 
> 
> On Tue, Jul 2, 2013 at 4:12 PM, Rufus Pollock <rufus.pollock at okfn.org>wrote:
> 
>> Hi folks,
>> 
>> I wanted to give a heads up on a new post I've put out today under the
>> title of "Git (and Github) for data":
>> 
>> <http://blog.okfn.org/2013/07/02/git-and-github-for-data/>
>> 
>> <excerpt>
>> The ability to do ?version control? for data is a big deal. There are
>> various options but one of the most attractive is to reuse existing
>> tools for doing this with code, like git and mercurial. This post
>> describes a simple ?data pattern? for storing and versioning data
>> using those tools which we?ve been using for some time and found to be
>> very effective.
>> </excerpt>
>> 
>> The basic pattern is very simple and probably familiar to lots of folks
>> here:
>> 
>> 1. Storing data as line-oriented text and specifically as CSV files.
>> ?Line oriented text? just indicates that individual units of the data
>> such as a row of a table (or an individual cell) corresponds to one
>> line.
>> 
>> 2. Use best of breed (code) versioning like git mercurial to store and
>> manage the data.
>> 
>> As people know, this is exactly the model in use for a while with
>> <https://github.com/datasets> and http://data.okfn.org/
>> 
>> Regards,
>> 
>> Rufus
>> 
>> PS: I should also add that the appearance of this post at the same
>> time as Max Ogden's recent Dat efforts is entirely fortuitous
>> coincidence - the original draft of this post was done a while ago
>> (polishing for actual publication always gets put off!), though I have
>> sought to add in some additional links in light of recent
>> developments!
>> 
>> _______________________________________________
>> okfn-labs mailing list
>> okfn-labs at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/okfn-labs
>> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130702/6dd27f67/attachment-0001.htm>
> 
> ------------------------------
> 
> Message: 4
> Date: Tue, 2 Jul 2013 17:04:44 +0100
> From: Rufus Pollock <rufus.pollock at okfn.org>
> Subject: Re: [okfn-labs] Using Git (and Github) for Data - a Data
>    Pattern
> To: Pieter Colpaert <pieter.colpaert at okfn.org>
> Cc: okfn-labs <okfn-labs at lists.okfn.org>, swtf at elis.ugent.be
> Message-ID:
>    <CAKssCpN4gwud0gPxiFgR0OfNawmJiOrEpk6bHD16h66VOb0NbQ at mail.gmail.com>
> Content-Type: text/plain; charset="windows-1252"
> 
> Hi Pieter,
> 
> Thanks for sharing this - I hadn't seen this before and it looks very
> interesting. Do you have a demo site using this by any chance - e.g. a web
> app where edits are versioned into the RDF database running rawbase on top
> of it?
> 
> Rufus
> 
> 
> On 2 July 2013 15:20, Pieter Colpaert <pieter.colpaert at okfn.org> wrote:
> 
>> Hi Rufus,
>> 
>> You might also like the git for triples we're building in our lab at the
>> university of Ghent: http://rawbase.github.io
>> 
>> Paper from WWW13 can be found here:
>> https://t.co/FxedWgR13Y
>> 
>> Kind regards,
>> 
>> Pieter
>> 
>> 
>> On Tue, Jul 2, 2013 at 4:12 PM, Rufus Pollock <rufus.pollock at okfn.org>wrote:
>> 
>>> Hi folks,
>>> 
>>> I wanted to give a heads up on a new post I've put out today under the
>>> title of "Git (and Github) for data":
>>> 
>>> <http://blog.okfn.org/2013/07/02/git-and-github-for-data/>
>>> 
>>> <excerpt>
>>> The ability to do ?version control? for data is a big deal. There are
>>> various options but one of the most attractive is to reuse existing
>>> tools for doing this with code, like git and mercurial. This post
>>> describes a simple ?data pattern? for storing and versioning data
>>> using those tools which we?ve been using for some time and found to be
>>> very effective.
>>> </excerpt>
>>> 
>>> The basic pattern is very simple and probably familiar to lots of folks
>>> here:
>>> 
>>> 1. Storing data as line-oriented text and specifically as CSV files.
>>> ?Line oriented text? just indicates that individual units of the data
>>> such as a row of a table (or an individual cell) corresponds to one
>>> line.
>>> 
>>> 2. Use best of breed (code) versioning like git mercurial to store and
>>> manage the data.
>>> 
>>> As people know, this is exactly the model in use for a while with
>>> <https://github.com/datasets> and http://data.okfn.org/
>>> 
>>> Regards,
>>> 
>>> Rufus
>>> 
>>> PS: I should also add that the appearance of this post at the same
>>> time as Max Ogden's recent Dat efforts is entirely fortuitous
>>> coincidence - the original draft of this post was done a while ago
>>> (polishing for actual publication always gets put off!), though I have
>>> sought to add in some additional links in light of recent
>>> developments!
>>> 
>>> _______________________________________________
>>> okfn-labs mailing list
>>> okfn-labs at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/okfn-labs
>>> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
> 
> 
> -- 
> *
> 
> Rufus Pollock
> 
> Founder and Co-Director | skype: rufuspollock |
> @rufuspollock<https://twitter.com/rufuspollock>
> 
> The Open Knowledge Foundation <http://okfn.org/>
> 
> Empowering through Open Knowledge
> http://okfn.org/ | @okfn <http://twitter.com/OKFN> | OKF on
> Facebook<https://www.facebook.com/OKFNetwork>|
> Blog <http://blog.okfn.org/>  |  Newsletter<http://okfn.org/about/newsletter>
> 
> *
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130702/ee909259/attachment-0001.htm>
> 
> ------------------------------
> 
> Message: 5
> Date: Tue, 2 Jul 2013 18:07:14 +0200
> From: Pieter Colpaert <pieter.colpaert at okfn.org>
> Subject: Re: [okfn-labs] Using Git (and Github) for Data - a Data
>    Pattern
> To: Rufus Pollock <rufus.pollock at okfn.org>
> Cc: okfn-labs <okfn-labs at lists.okfn.org>, swtf at elis.ugent.be
> Message-ID:
>    <CAP5Wo12vJ50bWB7_0PcEps9ijU+4pNv1uBQN==3bNYdxG1UUHg at mail.gmail.com>
> Content-Type: text/plain; charset="windows-1252"
> 
> Hi Rufus,
> 
> We are working very hard to get that working by the end of October. We'll
> keep you posted :)
> 
> Kind regards,
> 
> Pieter
> 
> 
> On Tue, Jul 2, 2013 at 6:04 PM, Rufus Pollock <rufus.pollock at okfn.org>wrote:
> 
>> Hi Pieter,
>> 
>> Thanks for sharing this - I hadn't seen this before and it looks very
>> interesting. Do you have a demo site using this by any chance - e.g. a web
>> app where edits are versioned into the RDF database running rawbase on top
>> of it?
>> 
>> Rufus
>> 
>> 
>> On 2 July 2013 15:20, Pieter Colpaert <pieter.colpaert at okfn.org> wrote:
>> 
>>> Hi Rufus,
>>> 
>>> You might also like the git for triples we're building in our lab at the
>>> university of Ghent: http://rawbase.github.io
>>> 
>>> Paper from WWW13 can be found here:
>>> https://t.co/FxedWgR13Y
>>> 
>>> Kind regards,
>>> 
>>> Pieter
>>> 
>>> 
>>> On Tue, Jul 2, 2013 at 4:12 PM, Rufus Pollock <rufus.pollock at okfn.org>wrote:
>>> 
>>>> Hi folks,
>>>> 
>>>> I wanted to give a heads up on a new post I've put out today under the
>>>> title of "Git (and Github) for data":
>>>> 
>>>> <http://blog.okfn.org/2013/07/02/git-and-github-for-data/>
>>>> 
>>>> <excerpt>
>>>> The ability to do ?version control? for data is a big deal. There are
>>>> various options but one of the most attractive is to reuse existing
>>>> tools for doing this with code, like git and mercurial. This post
>>>> describes a simple ?data pattern? for storing and versioning data
>>>> using those tools which we?ve been using for some time and found to be
>>>> very effective.
>>>> </excerpt>
>>>> 
>>>> The basic pattern is very simple and probably familiar to lots of folks
>>>> here:
>>>> 
>>>> 1. Storing data as line-oriented text and specifically as CSV files.
>>>> ?Line oriented text? just indicates that individual units of the data
>>>> such as a row of a table (or an individual cell) corresponds to one
>>>> line.
>>>> 
>>>> 2. Use best of breed (code) versioning like git mercurial to store and
>>>> manage the data.
>>>> 
>>>> As people know, this is exactly the model in use for a while with
>>>> <https://github.com/datasets> and http://data.okfn.org/
>>>> 
>>>> Regards,
>>>> 
>>>> Rufus
>>>> 
>>>> PS: I should also add that the appearance of this post at the same
>>>> time as Max Ogden's recent Dat efforts is entirely fortuitous
>>>> coincidence - the original draft of this post was done a while ago
>>>> (polishing for actual publication always gets put off!), though I have
>>>> sought to add in some additional links in light of recent
>>>> developments!
>>>> 
>>>> _______________________________________________
>>>> okfn-labs mailing list
>>>> okfn-labs at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/okfn-labs
>>>> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>> 
>> 
>> --
>> *
>> 
>> Rufus Pollock
>> 
>> Founder and Co-Director | skype: rufuspollock | @rufuspollock<https://twitter.com/rufuspollock>
>> 
>> The Open Knowledge Foundation <http://okfn.org/>
>> 
>> Empowering through Open Knowledge
>> http://okfn.org/ | @okfn <http://twitter.com/OKFN> | OKF on Facebook<https://www.facebook.com/OKFNetwork>|
>> Blog <http://blog.okfn.org/>  |  Newsletter<http://okfn.org/about/newsletter>
>> 
>> *
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130702/071003a3/attachment.htm>
> 
> ------------------------------
> 
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: http://lists.okfn.org/mailman/optionss/okfn-labs
> 
> 
> End of okfn-labs Digest, Vol 30, Issue 3
> ****************************************




More information about the okfn-labs mailing list