[okfn-labs] Data validation workflow management

Tom Morris tfmorris at gmail.com
Mon Nov 5 20:50:58 UTC 2012


I think this is a general need and I can't imagine that there aren't
government projects (open or not), which don't need it as well.  Basically
you are talking about a microtask or collaboration workflow framework.  It
doesn't have to be just validation, it could be data entry, data cleaning,
or any number of other similar tasks.

A use case near and dear to my heart is data cleaning for large data sets.
 If it's small enough for one person to do, you can use OpenRefine
(ex-Google Refine) or Excel or a number of other tools.  If the cleanups
are regular enough to script, you can use Python or your favorite scripting
language.  However, if you need both scale and human intervention, you
basically stuck with ad hoc partitioning of data sets by columns or rows,
merging subsets, etc.

There are markets like Amazon Mechanical Turk and tools like RABJ, PyBossa,
etc, but all of them require a fairly significant investment to get set up.
 I'd love to see a listing of solutions or partial solutions in this space.

Tom


On Mon, Nov 5, 2012 at 1:55 PM, Jun Matsushita <jmatsushita at internews.eu>wrote:

> Dear list,
>
> This is my first post, so any guidance on how to better interact with you
> is most welcome.
>
> Tom Rees advised me to post my question to you.
>
> A few times in the past months, I have come to experience a sort of gap
> that appeared in a few very different contexts. The "abstracted" version
> would basically:
>  - allow to submit evidence (anecdotal, scientifically grounded,
> crowdsourced, and most importantly a mix of these...) and,
>  - allow to link these submitted evidences to particular structured
> claims/facts (such as when fact-checking different aspects/components of a
> particular claim) and,
>  - allow to review/validate evidence through a range of different
> methodologies (peer-review, automatic ranking/rating systems, validation
> workflows,...).
>
> Another way to put this is that I started creating Google Spreadsheet
> which have 50+ collaborators and managing the collaboration on individual
> rows in impossible with versioning and crude permissions.
>
> As far as I know, this feature is sometimes included in existing products,
> but is not abstracted in a way that would allow to plugin different data
> store/collection components on the input side, and data
> visualisation/publication on the other side, in a way that is flexibly
> interface-able, configurable by non-techies and available in Open Source
> and Saas so that it can be widely adopted. I've been pointed toward Indaba (
> http://getindaba.org/) as an existing closed source approach to a subset
> of this.
>
> My questions are :
>  - Do you think this is an actual need in data collection and analysis
> projects?
>  - Have you seen this type of workflows being repeatedly implemented in
> different ways in different software products?
>  - Do you know of efforts to build an open source software project that
> would achieve part or whole of these features?
>
> Tom mentioned that Open Government projects usually don't necessarily need
> this type of workflows but suggested maybe Open Science project do. I guess
> this type of verification workflows would be upstream of a platform like
> CKAN, although CKAN could provide extensions to have specific views
> relevant to datasets that are in perpetual validation.
>
> Hope to hear about whether this makes sense to anyone else!
>
> Best,
>
> Jun
>
> Jun Matsushita
> Head of Innovation and Technology
>
> *MOBILE **+44 7429 144 691* | *SKYPE **junjulien
> *
> *jmatsushita at internews.eu*
> *www.internews.eu*
> *
> *
> *Information** **changes** **lives*
>
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20121105/a72d59e7/attachment-0002.html>


More information about the okfn-labs mailing list