[okfn-labs] [idea-rfc]: DataPipes - Streaming Online Data Transformation!

Rufus Pollock rufus.pollock at okfn.org
Wed May 8 17:27:00 UTC 2013


On 6 May 2013 20:05, Lucy Chambers <lucy.chambers at okfn.org> wrote:
> Hi Rufus,
>
> I like the sentiment, I'm just wondering whether it would be easier to use
> for a non-techie (if that is indeed your audience) than e.g. Open Refine to
> do the same thing (e.g. removing commas)?

You can do many of these operations in a lot of ways from a simple
spreadsheet (which I'd generally recommend over refine for something
this simple). A motivation here was something that was:

- simple (the nearest rival would be just doing the clean up in your
favourite spreadsheet)
- repeatable and shareable (how could i or others run this again and again)

Thsi latter point is especially important. Part of my motivation came
from the fact that recently i've ended up storing bash scripts like
this:

https://github.com/rgrp/dataset-gla/blob/master/scripts/clean.sh

This cleans up the February CSV spending data from Greater London Authority.

> Or is the point that this would be automated so that you could run common
> transformations automatically (e.g. without having to know commands in Open
> Refine)?

Yes

> Apologies if I've missed the point - not familiar with pipes :)

Great questions :-)

> Perhaps a concrete example would help, and as I'm currently writing up an
> ecosystem of tools for working with spending data, I'd be keen to offer up
> spending as one if that would work!

OK, please do share - this precisely came out of working to get data
into OpenSpending.

Rufus




More information about the okfn-labs mailing list