[openspending-dev] Data validation reporting
paulywalsh at gmail.com
Wed Nov 19 09:06:53 UTC 2014
I’m working on data validation (particularly *tabular* data validation) with Rufus.
In particular, we are looking to provide a great interface to *reporting* on the validation flow. In general, this means error reports resulting from the validation process, but also summary stuff (what happened, data stats).
I’m beginning to spec this, and looking for any initial thoughts/ideas/suggestions from the community on data validation reporting.
At a basic level, current thoughts are:
* JSON format for reports, with possibility to add report renderers (HTML, PDF, etc.)
* Report on the whole given file (or package of files)
* Is it valid CSV?
* Does it conform to a declared schema?
* Are there empty rows/columns; unknown columns
* Report per row
* Identify errors, which cell they occur in, and what they error is
* Actions - in some cases it maybe possible to auto-correct certain errors, or, for reporting, explicitly let the user know the action to take to solve a particular problem (e.g.: row 8 has a missing comma which is probably between cell 5 and 6)
A related question is whether the validation flow (and consequent reports) should fail early in certain circumstances:
For example, if row 5 is broken on a 5000 row CSV, stop the validation and generate the report; or, validate the whole file anyway (presuming it is possible to continue after the broken row 5). This could potentially be a configurable part of the validation/report flow.
Thanks - I look forward to hearing any comments.
More information about the openspending-dev