[okfn-labs] JSON table schema + CSV
james at floppy.org.uk
Wed Dec 3 08:37:25 UTC 2014
Have you come across our work with http://csvlint.io? <http://csvlint.io/?> It validates CSV against JSON Table Schema (in fact, Tabular Data Package) as you describe, though I don’t *think* we delve into the complex types yet that you mention. The core validation code is in a ruby gem at https://github.com/theodi/csvlint.rb <https://github.com/theodi/csvlint.rb>, and we’re always open to improvements, so if you’re interested in adding to that, we’d love to get more people working on it :)
For your question, I think a full-featured validator should check that the fields match what they are supposed to be. For instance, a field listed as GeoJSON or geopoint should be checked that it’s structure is correct. As for array, yes, the spec seems vague on that. Perhaps the spec should simply state that it should be a JSON array as for the types above?
I know the CSV on the Web working group are looking at this stuff as well - see https://w3c.github.io/csvw/ <https://w3c.github.io/csvw/> and https://github.com/w3c/csvw <https://github.com/w3c/csvw>, but I can’t see anything in the current docs talking about data types - I suspect that’s left to higher-level standards above CSV like TDP.
Open Data Institute
> On 3 Dec 2014, at 07:46, Paul Walsh <paulywalsh at gmail.com> wrote:
> I’m working on a JSON table schema validator (spec <http://dataprotocols.org/json-table-schema/>).
> My original intention was to port this Node implementation <https://github.com/okfn/json-table-schema-validator> to Python, but on closer inspection, the Node module does not cover enough of the spec, so I’m no longer “porting”, but writing an implementation using that as an existing example of one.
> My goal is to fully cover the spec, and my primary use case right now is validating CSV files against JSON table schemas.
> CSV as the data source raises issues with several of the types in the spec whose representation is object or array (object/json, array, geopoint, geojson). I’m not aware of any implementations that handle this (correct me if I’m wrong).
> I see two directions:
> 1. Don’t try to handle these types when source is CSV (e.g.: A CSV source could not have a field that is type geopoint)
> 2. Have a spec that describes how implementations MAY parse a CSV field as object or array, pre-validation. Something like:
> * TO_ARRAY (INTRAFIELD_SEPARATOR = '|’), e.g.: value|value|value
> * TO_OBJECT (INTRAFIELD_SEPARATOR = '**', INTRAFIELD_ASSIGNMENT = '='): e.g.: key=value**key=value**key=value
> Any thoughts?
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the okfn-labs