[okfn-discuss] Using Git (and Github) for Data - a Data Pattern

Rufus Pollock rufus.pollock at okfn.org
Tue Jul 2 16:55:32 UTC 2013


Hi folks,

I wanted to let people know about a new post I've posted today under the title
of "Git (and Github) for data":

http://blog.okfn.org/2013/07/02/git-and-github-for-data/

<excerpt>
The ability to do “version control” for data is a big deal. There are various
options but one of the most attractive is to reuse existing tools for doing
this with code, like git and mercurial. This post describes a simple “data
pattern” for storing and versioning data using those tools which we’ve been
using for some time and found to be very effective.
</excerpt>

The basic pattern is very simple and probably familiar to lots of folks
already:

1. Storing data as line-oriented text and specifically as CSV files. “Line
oriented text” just indicates that individual units of the data such as a
row of a table (or an individual cell) corresponds to one
line.

2. Use best of breed (code) versioning like git mercurial to store and manage
the data.

As people may know, this is exactly the model in use for a while with <
https://github.com/datasets> and http://data.okfn.org/

Regards,

Rufus*

*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-discuss/attachments/20130702/2a79dc79/attachment.html>


More information about the okfn-discuss mailing list