[okfn-labs] Frictionless Data Vision and Roadmap
kev at dataunity.org
Thu Jan 23 17:08:59 UTC 2014
Great, I like the vision. The cooking metaphor really works - with data
we're generally mixing up different ingredients (data) and recipes are
often made of several smaller recipes that can be reused elsewhere.
In the next few weeks I'll be trying to create a Semantic Web vocab to
formalise some of these ideas (I need them for the internals of the project
I'm working on to store data queries in an implementation independent
format). At the moment the recipes for data processing are often embedded
in scripts (like Python or SQL) so it's tricky to get reuse out of them.
However if we have a declarative way of specifying the common operations in
a dataflow it should make things easier to understand.
Workflow diagrams seem to be a natural 'DSL' for data processing so I'm
focussing on those (Directed Acyclic Graphs), similar to Cascading
workflows. If anyone else is working in this area, it would be great to
I'm especially interested in a way to encapsulate the reusable parts of a
data flow. In the cooking metaphor I guess you'd say that a recipe can be
the ingredient of another recipe. Using Semantic Web we should have a
framework for publishing dataflow logic so we can build up libraries of
common processes that can be strung together. I can see it being useful for
things like showing how a data set can be cleaned up in an implementation
Sorry I can't make the Labs Hangout, hope it goes well.
On 21 January 2014 14:03, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> *There is now a short Frictionless Data "vision" doc online at:*
> It is based on input from various people and comments would be warmly
> welcome. I've excerpted some of it below for those who prefer info in the
> mail client.
> ## Frictionless Data Ecosystem
> There's too much friction working with data - friction getting data,
> friction processing data, friction sharing data.
> This friction stops people doing stuff: stops them creating, sharing,
> collaborating, and using data - especially amongst more distributed
> It kills the cycles of find, improve, share that would make for a dynamic,
> productive and attractive (open) data ecosystem.
> We need to make an ecosystem that, like open-source for software, is
> useful and attractive to those without any principled interest, the vast
> majority who simply want the best tool for the job, the easiest route to
> their goal.
> We think that by getting a few key pieces in place we can reduce friction
> enough to revolutionize how the (open) data ecosystem operates with
> massively improved data quality, utilization and sharing.
> We think this because there's a multiplier here that means relatively
> small changes can have big effects. This multiplier is Network effects: the
> utility of a particular standard, pattern or even tool depends on how many
> other people are using it. This means that creating a critical mass of use
> around the tooling and standards will have a huge effect. This isn't easy.
> But after working on these issues for nearly a decade we think the time is
> ## A Metaphor
> Today, when you decide to cook, the ingredients are readily available at
> local supermarkets or even already in your kitchen. You don't need to
> travel to a farm, collect eggs, mill the corn, cure the bacon etc - as you
> once would have done! Instead, thanks to standard systems of measurement,
> packaging, shipping (e.g. containerization) and payment ingredients can get
> from the farm direct to my local shop or even my door.
> But with data we're still largely stuck at this early stage: every time
> you want to do an analysis or build an app you have to set off around the
> internet to dig up data, extract it, clean it and prepare it before you can
> even get it into your tool and begin your work proper.
> What do we need to do for the working with data to be like cooking today -
> where you get to spend your time making the cake (creating insights) not
> preparing and collecting the ingredients (digging up and cleaning data)?
> The answer: radical improvements in the "logistics2 of data associated
> with specialisation and standardisation. In analogy with food we need
> standard systems of "measurement", packaging, and transport so that its
> easy to get data from its original source into the application where I can
> start working with it.
> ## What We Want To Do
> We start with an advantage: unlike for physical goods transporting digital
> information from one computer to another is very cheap!
> This means the focus can be on standardizing and simplifying the process
> of getting data from one application to another (or one form to another).
> The following gives an overview of the main areas of work. There is more
> detail in the Roadmap <http://data.okfn.org/roadmap>.
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the okfn-labs