[okfn-labs] [idea-rfc]: DataPipes - Streaming Online Data Transformation!

Daniel Maxwell dan at thesupernode.org
Tue May 7 08:39:52 UTC 2013


Can I put you on to Livesheets (http://livesheets.com), which is an example
of this type of concept. It allows you to share a node to perform a
particular action.

It has just been transformed into a non-profit foundation (the Supernode
Foundation - http://thesupernode.org) with the aim of connecting all data,
functions and visualisations.

A new version is under way, which is vastly more powerful than the old one,
and will eventually be able to work with any type of data.

Dan


On 7 May 2013 09:20, <okfn-labs-request at lists.okfn.org> wrote:

> Send okfn-labs mailing list submissions to
>         okfn-labs at lists.okfn.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.okfn.org/mailman/listinfo/okfn-labs
> or, via email, send a message with subject or body 'help' to
>         okfn-labs-request at lists.okfn.org
>
> You can reach the person managing the list at
>         okfn-labs-owner at lists.okfn.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of okfn-labs digest..."
>
> Today's Topics:
>
>    1. Re: [idea-rfc]: DataPipes - Streaming Online Data
>       Transformation! (Emanuil Tolev)
>    2. Re: [idea-rfc]: DataPipes - Streaming Online Data
>       Transformation! (Lucy Chambers)
>    3. Re: [idea-rfc]: DataPipes - Streaming Online Data
>       Transformation! (Michael Bauer)
>    4. Open Humanities Hangout Today at 5pm BST (Sam Leon)
>    5. Re: [idea-rfc]: DataPipes - Streaming Online Data
>       Transformation! (Ross Jones)
>
>
> ---------- Forwarded message ----------
> From: Emanuil Tolev <emanuil at cottagelabs.com>
> To: okfn-labs <okfn-labs at lists.okfn.org>
> Cc:
> Date: Mon, 6 May 2013 17:19:16 +0100
> Subject: Re: [okfn-labs] [idea-rfc]: DataPipes - Streaming Online Data
> Transformation!
> Hi Rufus,
>
> Anything like http://pipes.yahoo.com/pipes/ ? (Note: I haven't had time
> to use it yet, so can't vouch for suitability, but it seems like the right
> thing.)
>
> I would be glad to see integratable components as well (but I like the
> streaming data idea).
> They probably exist, but mostly don't seem to match exactly what I'm
> looking for to do a specific job quickly, and then things like
> https://github.com/CottageLabs/metadata-enhancement/blob/master/csv_utils.pyoccur, and clearly many people need to do similar tasks :).
>
>
> Greetings,
> Emanuil
>
>
> On 6 May 2013 15:49, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>
>> At last week's Open Data Maker Night here in London some of us [1]
>> started kicking around an idea we called *Data Pipes. *The basic pitch
>> was [2]:
>>
>> *Data Pipes would be a service to do streaming online data transformation.
>> Heavily inspired by unix shell with its pipes and utilities like cut,
>> grep, sed, sort, uniq etc. We want to work with streams so focus
>> (initially) is on CSV files.*
>>
>> As a demonstration of the idea the barest prototype has been put together:
>>
>> http://datapipes.okfnlabs.org/  -  (source code on github<https://github.com/okfn/datapipes>
>> )
>>
>> This is barely functional - there's just one working operation (delete)
>> atm - but there are plans for many more<https://github.com/okfn/datapipes/issues/9>and i already like how natural this feels in node.js.
>>
>> Is this useful? Do people have tips (e.g. how best to stream post data
>> in node.js <https://github.com/okfn/datapipes/issues/5>)? Is anyone up
>> for contributing <https://github.com/okfn/datapipes/issues>?
>>
>> Regards,
>>
>> Rufus
>>
>> [1]: specifically Ross Jones, James Smith, David Miller and myself. Plus,
>> from comments on IRC, I thik Friedrich (Lindenberg) had also been thinking
>> along similar lines!
>>
>> [2]: the immediate motivation was a relatively non-tecchy participant at
>> the open data maker night who want to remove commas from amounts in a CSV
>> column before putting the data into OpenSpending. A common enough
>> requirement but one which would involve some spreadsheet-fu or scripting to
>> sort out. Why, we thought, shouldn't this just be a simple web-service ...
>>
>> _______________________________________________
>> okfn-labs mailing list
>> okfn-labs at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/okfn-labs
>> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>>
>>
>
>
> ---------- Forwarded message ----------
> From: Lucy Chambers <lucy.chambers at okfn.org>
> To: emanuil at cottagelabs.com
> Cc: okfn-labs <okfn-labs at lists.okfn.org>
> Date: Mon, 6 May 2013 15:05:26 -0400
> Subject: Re: [okfn-labs] [idea-rfc]: DataPipes - Streaming Online Data
> Transformation!
> Hi Rufus,
>
> I like the sentiment, I'm just wondering whether it would be easier to use
> for a non-techie (if that is indeed your audience) than e.g. Open Refine to
> do the same thing (e.g. removing commas)?
>
> Or is the point that this would be automated so that you could run common
> transformations automatically (e.g. without having to know commands in Open
> Refine)?
>
> Apologies if I've missed the point - not familiar with pipes :)
>
> Perhaps a concrete example would help, and as I'm currently writing up an
> ecosystem of tools for working with spending data, I'd be keen to offer up
> spending as one if that would work!
>
> Lucy
>
>
>
>
>
>
> On 6 May 2013 12:19, Emanuil Tolev <emanuil at cottagelabs.com> wrote:
>
>> Hi Rufus,
>>
>> Anything like http://pipes.yahoo.com/pipes/ ? (Note: I haven't had time
>> to use it yet, so can't vouch for suitability, but it seems like the right
>> thing.)
>>
>> I would be glad to see integratable components as well (but I like the
>> streaming data idea).
>> They probably exist, but mostly don't seem to match exactly what I'm
>> looking for to do a specific job quickly, and then things like
>> https://github.com/CottageLabs/metadata-enhancement/blob/master/csv_utils.pyoccur, and clearly many people need to do similar tasks :).
>>
>>
>> Greetings,
>> Emanuil
>>
>>
>> On 6 May 2013 15:49, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>>
>>> At last week's Open Data Maker Night here in London some of us [1]
>>> started kicking around an idea we called *Data Pipes. *The basic pitch
>>> was [2]:
>>>
>>> *Data Pipes would be a service to do streaming online data
>>> transformation. Heavily inspired by unix shell with its pipes and
>>> utilities like cut, grep, sed, sort, uniq etc. We want to work with
>>> streams so focus (initially) is on CSV files.*
>>>
>>> As a demonstration of the idea the barest prototype has been put
>>> together:
>>>
>>> http://datapipes.okfnlabs.org/  -  (source code on github<https://github.com/okfn/datapipes>
>>> )
>>>
>>> This is barely functional - there's just one working operation (delete)
>>> atm - but there are plans for many more<https://github.com/okfn/datapipes/issues/9>and i already like how natural this feels in node.js.
>>>
>>> Is this useful? Do people have tips (e.g. how best to stream post data
>>> in node.js <https://github.com/okfn/datapipes/issues/5>)? Is anyone up
>>> for contributing <https://github.com/okfn/datapipes/issues>?
>>>
>>> Regards,
>>>
>>> Rufus
>>>
>>> [1]: specifically Ross Jones, James Smith, David Miller and myself.
>>> Plus, from comments on IRC, I thik Friedrich (Lindenberg) had also been
>>> thinking along similar lines!
>>>
>>> [2]: the immediate motivation was a relatively non-tecchy participant at
>>> the open data maker night who want to remove commas from amounts in a CSV
>>> column before putting the data into OpenSpending. A common enough
>>> requirement but one which would involve some spreadsheet-fu or scripting to
>>> sort out. Why, we thought, shouldn't this just be a simple web-service ...
>>>
>>> _______________________________________________
>>> okfn-labs mailing list
>>> okfn-labs at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/okfn-labs
>>> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>>>
>>>
>>
>> _______________________________________________
>> okfn-labs mailing list
>> okfn-labs at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/okfn-labs
>> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>>
>>
>
>
> --
> *Project Coordinator*
> School of Data <http://schoolofdata.org/> and
> OpenSpending <http://openspending.org/>
> Projects of the Open Knowledge Foundation <http://okfn.org/>
> Support our work <http://okfn.org/support/>.
>
>
>
>
> ---------- Forwarded message ----------
> From: Michael Bauer <michael.bauer at okfn.org>
> To: Rufus Pollock <rufus.pollock at okfn.org>
> Cc: okfn-labs <okfn-labs at lists.okfn.org>
> Date: Tue, 7 May 2013 10:05:24 +0200
> Subject: Re: [okfn-labs] [idea-rfc]: DataPipes - Streaming Online Data
> Transformation!
> Hi,
>
> Currently datapipes.okfn.org leads me to okfn.org/register
>
> I do like the idea of unix pipes - however it's hard for me to imagine it
> working on a web level. Wouldn't this be just a transforming version of the
> data proxy? Where I have a source, several steps and an output?
>
> Will you allow me to use my own scripting to define new elements in
> pipes? e.g. a new transformation in the pipeline?
>
> At one point we will reach the issue of running proprietary untrusted code
> on our server (with map, reduce and filter functions) - I'd propose to use
> a language that supports proper sandboxing underneath. (Not sure node.js
> does so). Also I don't like javascript as a data handling language: it's
> simply not designed for it (yes I'm a hopeless lamdahead).
>
> Right now the pipe is simply one block, combining multiple pipes seems
> painfull - can we make this easier?
>
> Michael
>
> On Mon, May 06, 2013 at 03:49:21PM +0100, Rufus Pollock wrote:
> > At last week's Open Data Maker Night here in London some of us [1]
> started
> > kicking around an idea we called *Data Pipes. *The basic pitch was [2]:
> >
> > *Data Pipes would be a service to do streaming online data
> transformation.
> > Heavily inspired by unix shell with its pipes and utilities like cut,
> grep,
> > sed, sort, uniq etc. We want to work with streams so focus (initially) is
> > on CSV files.*
> >
> > As a demonstration of the idea the barest prototype has been put
> together:
> >
> > http://datapipes.okfnlabs.org/  -  (source code on
> > github<https://github.com/okfn/datapipes>
> > )
> >
> > This is barely functional - there's just one working operation (delete)
> atm
> > - but there are plans for many
> > more<https://github.com/okfn/datapipes/issues/9>and i already like how
> > natural this feels in node.js.
> >
> > Is this useful? Do people have tips (e.g. how best to stream post data in
> > node.js <https://github.com/okfn/datapipes/issues/5>)? Is anyone up for
> > contributing <https://github.com/okfn/datapipes/issues>?
> >
> > Regards,
> >
> > Rufus
> >
> > [1]: specifically Ross Jones, James Smith, David Miller and myself. Plus,
> > from comments on IRC, I thik Friedrich (Lindenberg) had also been
> thinking
> > along similar lines!
> >
> > [2]: the immediate motivation was a relatively non-tecchy participant at
> > the open data maker night who want to remove commas from amounts in a CSV
> > column before putting the data into OpenSpending. A common enough
> > requirement but one which would involve some spreadsheet-fu or scripting
> to
> > sort out. Why, we thought, shouldn't this just be a simple web-service
> ...
>
> > _______________________________________________
> > okfn-labs mailing list
> > okfn-labs at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/okfn-labs
> > Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>
>
> --
> Data Wrangler with the Open Knowledge Foundation (OKFN.org)
> GPG/PGP key: http://tentacleriot.eu/mihi.asc
> Twitter: @mihi_tr Skype: mihi_tr
>
>
>
>
> ---------- Forwarded message ----------
> From: Sam Leon <sam.leon at okfn.org>
> To: okfn-labs <okfn-labs at lists.okfn.org>, A list for people interested in
> the use of open source tools and open access in humanities teaching and
> research <open-humanities at lists.okfn.org>
> Cc:
> Date: Tue, 7 May 2013 09:18:54 +0100
> Subject: [okfn-labs] Open Humanities Hangout Today at 5pm BST
> Hi everyone,
>
> We'll be running an Open Humanities Hangout this evening from 5pm BST<http://okfn.org/events/hangouts/open-humanities-hangout/>
> .
>
> If you're a hacker or simply someone whose interested in how openness can
> benefit humanities research with no technical skills - pop in! We hope to
> continue working on TEXTUS our plans for which are documented here<http://okfn.org/events/hangouts/open-humanities-hangout/>
> .
>
> I'll circulate the Google Hangout link 5 minutes before. If you'll be
> there, drop me a message during the day!
>
> Cheers,
> Sam
>
> --
> Sam Leon
> Project Manager
> Open Knowledge Foundation
> http://okfn.org/
> Skype: samedleon
>
>
> ---------- Forwarded message ----------
> From: Ross Jones <ross at servercode.co.uk>
> To: Michael Bauer <michael.bauer at okfn.org>
> Cc: okfn-labs <okfn-labs at lists.okfn.org>
> Date: Tue, 7 May 2013 09:20:53 +0100
> Subject: Re: [okfn-labs] [idea-rfc]: DataPipes - Streaming Online Data
> Transformation!
> Hi,
>
> On 7 May 2013, at 09:05, Michael Bauer <michael.bauer at okfn.org> wrote:
>
> I do like the idea of unix pipes - however it's hard for me to imagine it
> working on a web level. Wouldn't this be just a transforming version of the
> data proxy? Where I have a source, several steps and an output?
>
>
> There's a precedent for thinking that pipes can work on the web (
> http://www.webpipes.org/), I guess on the basis that they both work with
> streams (throughput and latency issues aside). Part of my thoughts on
> Rufus' original idea was to build a DSL that even though pulling data from
> remote locations, would do all of the processing locally, but I think I'm
> sold on trying this approach first (and I think there may already be more
> than one DSL for working with CSV/XSL etc).
>
> At one point we will reach the issue of running proprietary untrusted code
> on our server (with map, reduce and filter functions) - I'd propose to use
> a language that supports proper sandboxing underneath. (Not sure node.js
> does so). Also I don't like javascript as a data handling language: it's
> simply not designed for it (yes I'm a hopeless lamdahead).
>
>
> That's definitely an issue, and I was thinking of abusing the fact that
> ScraperWiki is open source to build a really lightweight version that is
> *just* about sandboxed code execution (I know the old codebase pretty well)
> - also it's a good excuse to play with docker.io ;).
>
> I still haven't 100% figured out how we'd cleanly stream the data through
> the sandbox (unless the whole app was inside it) but I've got the various
> parts floating around in my head, I just need more coffee and time to make
> the ideas more concrete.
>
> Ross.
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: http://lists.okfn.org/mailman/optionss/okfn-labs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130507/b56b9c47/attachment-0002.html>


More information about the okfn-labs mailing list