[openspending-dev] [okfn-labs] Data validation reporting

Friedrich Lindenberg friedrich.lindenberg at okfn.org
Wed Nov 19 12:50:18 UTC 2014


Something I would really need is a stand-alone, light-weight structured
logging system for ETL processes in Python. That's the more general version
of what we were doing with the UK25k stuff a few years ago, and it would
not overlap with CSVLint. Features:

* Be able to log structured data from ETL processes
* Generate custom, jinja2-based reports from them
* Have a set of pre-defined gauges to test for stuff like null values,
extreme values etc.
* Have an emailer for certain really nasty events

Here's a very duct-tapey version of this:
https://github.com/pudo/scrapekit/blob/master/scrapekit/logs.py - basically
just hacked the Python logger to make JSON. Wanted to extract it as
"reportkit", but haven't gotten around to that.

- Friedrich


On Wed, Nov 19, 2014 at 1:05 PM, Ross Jones <ross at servercode.co.uk> wrote:

> Oh I see.  If it’s any use, the csvlint user stories from the workshops
> that were run are at
> https://docs.google.com/spreadsheet/ccc?key=0AiswT8ko8hb4dERHUVBKYlBZVnlYSHI5M2V1TVpodlE&usp=sharing#gid=0
>
> Ross
>
>
> On 19 Nov 2014, at 09:38, Paul Walsh <paulywalsh at gmail.com> wrote:
>
> Hi Ross,
>
> Yes, I’ve looked at csvlint. Before getting to the solution to the problem
> (csvlint, something else) we first want to ensure we know the scope of the
> problem itself, define use cases, etc.
>
> But sure, csvlint does meet some of the current requirements we have in
> mind.
>
> Paul
>
>
> On 19 Nov 2014, at 11:09, Ross Jones <ross at servercode.co.uk> wrote:
>
> Hi Paul,
>
> On 19 Nov 2014, at 09:06, Paul Walsh <paulywalsh at gmail.com> wrote:
> Hi all,
>
> I’m working on data validation (particularly *tabular* data validation)
> with Rufus.
>
> In particular, we are looking to provide a great interface to *reporting*
> on the validation flow. In general, this means error reports resulting from
> the validation process, but also summary stuff (what happened, data stats).
>
>
>
> Have you investigated http://csvlint.io (https://github.com/theodi/csvlint)
> yet?  That seems to solve most of the problems that you mentioned, and I’m
> sure it could be extended to support the others.
>
>
> Ross
>
>
>
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20141119/7794735c/attachment-0002.html>


More information about the openspending-dev mailing list