[okfn-labs] [idea-rfc]: DataPipes - Streaming Online Data Transformation!

Rufus Pollock rufus.pollock at okfn.org
Wed May 8 17:17:51 UTC 2013


@Medhi and Emanuil: thank-you both for highlighting yahoo pipes - it's
a good point.

Some thoughts on differences:

* It's open-source, deployable and hackable by anyone
   - this is also relevant for the future (will y! pipes still be around?)
* DataPipes focused on "data-y" stuff rather than "web-y" stuff (CSV
and possibly plain text rather than RSS etc)
* More geeky: it's about urls and curl etc not having a visual
designer (about automating some tasks)
   - focus also on the unix style commands on files (repeat what you
do locally - or share that with others)
   - share the same command against different urls
* Streaming data (and more of it): it may be just my impression but
Yahoo pipes often seems to be very small (e.g. 10-100 line datasets)
whereas this would be for CSV files that could be 1000s of items

@Emanuil: thanks for the pointer for local scripting stuff. I do a lot
locally both with bash, python and node. If you are doing stuff
locally with csv files I highly recommend stuff like csvkit!

A concrete motivation was seeing people working to get OpenSpending
data cleaned up and wanting to have a simple online way to do the easy
stuff in a *shareable* way - and without having to install a
specialist tool (be that refine, python or ...)

Rufus

On 6 May 2013 17:19, Emanuil Tolev <emanuil at cottagelabs.com> wrote:
> Hi Rufus,
>
> Anything like http://pipes.yahoo.com/pipes/ ? (Note: I haven't had time to
> use it yet, so can't vouch for suitability, but it seems like the right
> thing.)
>
> I would be glad to see integratable components as well (but I like the
> streaming data idea).
> They probably exist, but mostly don't seem to match exactly what I'm looking
> for to do a specific job quickly, and then things like
> https://github.com/CottageLabs/metadata-enhancement/blob/master/csv_utils.py
> occur, and clearly many people need to do similar tasks :).
>
>
> Greetings,
> Emanuil
>
>
> On 6 May 2013 15:49, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>>
>> At last week's Open Data Maker Night here in London some of us [1] started
>> kicking around an idea we called Data Pipes. The basic pitch was [2]:
>>
>> Data Pipes would be a service to do streaming online data transformation.
>> Heavily inspired by unix shell with its pipes and utilities like cut, grep,
>> sed, sort, uniq etc. We want to work with streams so focus (initially) is on
>> CSV files.
>>
>> As a demonstration of the idea the barest prototype has been put together:
>>
>> http://datapipes.okfnlabs.org/  -  (source code on github)
>>
>> This is barely functional - there's just one working operation (delete)
>> atm - but there are plans for many more and i already like how natural this
>> feels in node.js.
>>
>> Is this useful? Do people have tips (e.g. how best to stream post data in
>> node.js)? Is anyone up for contributing?
>>
>> Regards,
>>
>> Rufus
>>
>> [1]: specifically Ross Jones, James Smith, David Miller and myself. Plus,
>> from comments on IRC, I thik Friedrich (Lindenberg) had also been thinking
>> along similar lines!
>>
>> [2]: the immediate motivation was a relatively non-tecchy participant at
>> the open data maker night who want to remove commas from amounts in a CSV
>> column before putting the data into OpenSpending. A common enough
>> requirement but one which would involve some spreadsheet-fu or scripting to
>> sort out. Why, we thought, shouldn't this just be a simple web-service ...
>>
>> _______________________________________________
>> okfn-labs mailing list
>> okfn-labs at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/okfn-labs
>> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>>
>
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>




More information about the okfn-labs mailing list