[okfn-labs] JSON table schema + CSV

Paul Walsh paulywalsh at gmail.com
Wed Dec 3 09:00:35 UTC 2014


Hi James,

Yes, I am definitely aware of CSVLint and it is a great project. I’m doing a Python implementation of schema validation and some other modular components for use in the OS ecosystem.

As you say, it does not appear to handle the complex data types in the spec.

I’ll provide example if what I want to support:

# people.csv with geopoint as array
first_name,home
Paul,32.0934|34.7841

# alt. people.csv with geopoint as object
first_name,home
Paul,lat=32.0934**lon=34.7841

# schema.csv
{
    “fields”: [
        {“name”: “first_name”, type: “string”},
        {“name”: “home”, “type”: “geopoint"}
    ]
}

Geopoint is an array, so the linter or validator would need to know how to convert “home” into an array, in order for CSV files to be able to have instances of all types described in the spec.

Obviously some objects (e.g.: polygons) would be very unwieldy if represented in CSV like this: a reference property would likely be a better option in some cases. 


> On 3 Dec 2014, at 10:37, James Smith <james at floppy.org.uk> wrote:
> 
> Hi Paul,
> 
> Have you come across our work with http://csvlint.io? <http://csvlint.io/?> It validates CSV against JSON Table Schema (in fact, Tabular Data Package) as you describe, though I don’t *think* we delve into the complex types yet that you mention. The core validation code is in a ruby gem at https://github.com/theodi/csvlint.rb <https://github.com/theodi/csvlint.rb>, and we’re always open to improvements, so if you’re interested in adding to that, we’d love to get more people working on it :)
> 
> For your question, I think a full-featured validator should check that the fields match what they are supposed to be. For instance, a field listed as GeoJSON or geopoint should be checked that it’s structure is correct. As for array, yes, the spec seems vague on that. Perhaps the spec should simply state that it should be a JSON array as for the types above? 
> 
> I know the CSV on the Web working group are looking at this stuff as well - see https://w3c.github.io/csvw/ <https://w3c.github.io/csvw/> and https://github.com/w3c/csvw <https://github.com/w3c/csvw>, but I can’t see anything in the current docs talking about data types - I suspect that’s left to higher-level standards above CSV like TDP.
> 
> cheers,
> James Smith
> Open Data Institute
> 
>> On 3 Dec 2014, at 07:46, Paul Walsh <paulywalsh at gmail.com <mailto:paulywalsh at gmail.com>> wrote:
>> 
>> Hi,
>> 
>> I’m working on a JSON table schema validator (spec <http://dataprotocols.org/json-table-schema/>). 
>> 
>> My original intention was to port this Node implementation <https://github.com/okfn/json-table-schema-validator> to Python, but on closer inspection, the Node module does not cover enough of the spec, so I’m no longer “porting”, but writing an implementation using that as an existing example of one.
>> 
>> My goal is to fully cover the spec, and my primary use case right now is validating CSV files against JSON table schemas. 
>> 
>> CSV as the data source raises issues with several of the types in the spec whose representation is object or array (object/json, array, geopoint, geojson). I’m not aware of any implementations that handle this (correct me if I’m wrong). 
>> 
>> I see two directions:
>> 
>> 1. Don’t try to handle these types when source is CSV (e.g.: A CSV source could not have a field that is type geopoint)
>> 2. Have a spec that describes how implementations MAY parse a CSV field as object or array, pre-validation. Something like:
>>     * TO_ARRAY (INTRAFIELD_SEPARATOR = '|’), e.g.: value|value|value
>>     * TO_OBJECT (INTRAFIELD_SEPARATOR = '**', INTRAFIELD_ASSIGNMENT = '='): e.g.: key=value**key=value**key=value
>> 
>> 
>> Any thoughts?
>> 
>> Paul
>> _______________________________________________
>> okfn-labs mailing list
>> okfn-labs at lists.okfn.org <mailto:okfn-labs at lists.okfn.org>
>> https://lists.okfn.org/mailman/listinfo/okfn-labs
>> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20141203/b981e87f/attachment-0004.html>


More information about the okfn-labs mailing list