[okfn-labs] Using Git (and Github) for Data - a Data Pattern

Rufus Pollock rufus.pollock at okfn.org
Tue Jul 2 14:12:07 UTC 2013


Hi folks,

I wanted to give a heads up on a new post I've put out today under the
title of "Git (and Github) for data":

<http://blog.okfn.org/2013/07/02/git-and-github-for-data/>

<excerpt>
The ability to do “version control” for data is a big deal. There are
various options but one of the most attractive is to reuse existing
tools for doing this with code, like git and mercurial. This post
describes a simple “data pattern” for storing and versioning data
using those tools which we’ve been using for some time and found to be
very effective.
</excerpt>

The basic pattern is very simple and probably familiar to lots of folks here:

1. Storing data as line-oriented text and specifically as CSV files.
“Line oriented text” just indicates that individual units of the data
such as a row of a table (or an individual cell) corresponds to one
line.

2. Use best of breed (code) versioning like git mercurial to store and
manage the data.

As people know, this is exactly the model in use for a while with
<https://github.com/datasets> and http://data.okfn.org/

Regards,

Rufus

PS: I should also add that the appearance of this post at the same
time as Max Ogden's recent Dat efforts is entirely fortuitous
coincidence - the original draft of this post was done a while ago
(polishing for actual publication always gets put off!), though I have
sought to add in some additional links in light of recent
developments!




More information about the okfn-labs mailing list