[okfn-labs] New library: Tabular Validator

Paul Walsh paulywalsh at gmail.com
Thu Feb 19 15:52:31 UTC 2015


Hey Friedrich,

TellMe is very simple, and def. influenced by a discussion we had here a few months ago, where you linked to some code for json logging. I think it can be a very useful little utility when we get to a first stable release with jinja template export etc.

I don’t have a demo available right now, but over the next week or so I’ll announce the UI/Web API for Tabular Validator which will have a working implementation.

About existing validation libraries:

I definitely did not set out to re-invent the wheel, and early on I spent quite some time with both validictory and python-jsonschema (https://python-jsonschema.readthedocs.org/en/latest/). 

However, a major goal here is getting to a complete working implementation of the JSON Table Schema spec (hence, JTSKit), and that started to feel awkward to do via JSON Schema (which would be required in python-jsonschema, and was also how I was using validictory at the time). In the end, I’m not that sure that using validictory would save me writing much code here (but, I’d have to revisit it again to back up this statement :)).

I also set out to use messytables, and I ended up taking some patterns from there (e.g.: the type casting in jtskit), but I do not use it directly as I am aiming for Python2/3 support (ref. https://github.com/okfn/messytables/issues/117). I’m hoping to bring some more things over from messytables, and/or contribute to Python 3 support in messytables. 

Best,

Paul


> On 19 Feb 2015, at 17:15, Friedrich Lindenberg <friedrich.lindenberg at okfn.org> wrote:
> 
> Good stuff, Paul! I like the "many small modules" approach you're doing there a lot!
> 
> To be honest, the tabular validator reminds me a bit of existing data validation tools, especially https://github.com/sunlightlabs/validictory <https://github.com/sunlightlabs/validictory>. Just out of interest: why did you opt against using an existing validation library and wrap it for reporting? That kind of approach worked quite well for OpenSpending (cf. https://github.com/openspending/osvalidate <https://github.com/openspending/osvalidate>). 
> 
> What I'm really interested in is TellMe, that sounds very cool. Is there a demo application and some example output somewhere that one could look at? 
> 
> Cheers, 
> 
> - Friedrich 
> 
> 
> 
> On Thu, Feb 19, 2015 at 4:53 PM, Paul Walsh <paulywalsh at gmail.com <mailto:paulywalsh at gmail.com>> wrote:
> Hi Labs,
> 
> I want to announce a new library I’ve been working on for OK.
> 
> Tabular Validator (https://github.com/okfn/tabular-validator <https://github.com/okfn/tabular-validator>) is a Python package for validating tabular data through a processing pipeline. It is alpha software.
> 
> It is built by Open Knowledge, with funding from the Open Data User Group (https://www.gov.uk/government/groups/open-data-user-group <https://www.gov.uk/government/groups/open-data-user-group>).
> 
> Applications range from simple validation checks on CSV files, to integration with a larger ETL pipeline.
> 
> The codebase currently ships with two validators that can be used in a pipeline:
> 
>         • The StructureValidator checks for common structural errors
>         • The SchemaValidator checks for conformance to a JSON Table Schema.
> 
> There is a hook to add custom validators, and there are plans to include more validators in the core library.
> 
> There is some documentation (http://tabular-validator.readthedocs.org/en/latest/ <http://tabular-validator.readthedocs.org/en/latest/>), but it is lacking in some areas. You are welcome to check out the code, run the tests (or check them on Travis), open an issue, or make a pull request to help us iterate to a version one release (here is the backlog).
> 
> We’ve also released some packages that are used in Tabular Validator: TVWeb (https://github.com/okfn/tabular-validator-web <https://github.com/okfn/tabular-validator-web>), JTSKit (https://github.com/okfn/jtskit-py <https://github.com/okfn/jtskit-py>), and TellMe (https://github.com/okfn/tellme <https://github.com/okfn/tellme>). You can read more about each of these by following the links. A more complete blog post on the Labs blog will follow shortly.
> 
> Thanks,
> 
> Paul
> 
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org <mailto:okfn-labs at lists.okfn.org>
> https://lists.okfn.org/mailman/listinfo/okfn-labs <https://lists.okfn.org/mailman/listinfo/okfn-labs>
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs <https://lists.okfn.org/mailman/options/okfn-labs>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20150219/ce32d91c/attachment-0004.html>


More information about the okfn-labs mailing list