[okfn-labs] JSON table schema + CSV

Tryggvi Björgvinsson tryggvi.bjorgvinsson at okfn.org
Wed Dec 3 09:36:33 UTC 2014


Here are my two cents on this.

I agree that the spec is to vague on this.

I would as much as possible try to avoid adding a new delimiter in the
CSV. I would at least not expect a validator to enforce an unwritten
rule onto CSVs. Since the spec doesn't name | or ** or whatever as a
delimiter I don't think we can validate against it.

The spec does give some examples:

>   * *object*: (alias json) an JSON-encoded object
>  *
>
>     *geopoint*: has one of the following structures:
>
>     |{ lon: ..., lat: ... }
>
>     [lon,lat]
>
>     "lon, lat"
>     |
>   * *geojson*: as per <<http://geojson.org/>>
>   * *array*: an array
>

I think the geopoint is  the simplest of these. You can just encode
according to the structure in the csv (even if it is weird):

name,home
Paul,"32.0934, 34.7841"

You could also incorporate json in the csv althought that looks very
ugly and makes me shudder:

name,home
Paul,"[32.0934, 34.7841]"

or

name,home
Paul,"{lon:32.0934, lat:34.7841}"

That said, this would then also work for object (or json if you use the
alias). It would also work for geojson and the array could be
represented as a json array.

Like I said, I don't like mixing these two and I wish the spec was
clearer on this instead of just throwing it out there but following this
is the only thing I feel can be done without the validator becoming to
specific by defining its own rules (which it shouldn't).

Perhaps it is worth raising this in the issue tracker for dataprotocols:
https://github.com/dataprotocols/dataprotocols

/Tryggvi

On mið 3.des 2014 09:00, Paul Walsh wrote:
> Hi James,
>
> Yes, I am definitely aware of CSVLint and it is a great project. I’m
> doing a Python implementation of schema validation and some other
> modular components for use in the OS ecosystem.
>
> As you say, it does not appear to handle the complex data types in the
> spec.
>
> I’ll provide example if what I want to support:
>
> # people.csv with geopoint as array
> first_name,home
> Paul,32.0934|34.7841
>
> # alt. people.csv with geopoint as object
> first_name,home
> Paul,lat=32.0934**lon=34.7841
>
> # schema.csv
> {
>     “fields”: [
>         {“name”: “first_name”, type: “string”},
>         {“name”: “home”, “type”: “geopoint"}
>     ]
> }
>
> Geopoint is an array, so the linter or validator would need to know
> how to convert “home” into an array, in order for CSV files to be able
> to have instances of all types described in the spec.
>
> Obviously some objects (e.g.: polygons) would be very unwieldy if
> represented in CSV like this: a reference property would likely be a
> better option in some cases. 
>
>
>> On 3 Dec 2014, at 10:37, James Smith <james at floppy.org.uk
>> <mailto:james at floppy.org.uk>> wrote:
>>
>> Hi Paul,
>>
>> Have you come across our work with http://csvlint.io?
>> <http://csvlint.io/?> It validates CSV against JSON Table Schema (in
>> fact, Tabular Data Package) as you describe, though I don’t *think*
>> we delve into the complex types yet that you mention. The core
>> validation code is in a ruby gem
>> at https://github.com/theodi/csvlint.rb, and we’re always open to
>> improvements, so if you’re interested in adding to that, we’d love to
>> get more people working on it :)
>>
>> For your question, I think a full-featured validator should check
>> that the fields match what they are supposed to be. For instance, a
>> field listed as GeoJSON or geopoint should be checked that it’s
>> structure is correct. As for array, yes, the spec seems vague on
>> that. Perhaps the spec should simply state that it should be a JSON
>> array as for the types above? 
>>
>> I know the CSV on the Web working group are looking at this stuff as
>> well -
>> see https://w3c.github.io/csvw/ and https://github.com/w3c/csvw, but
>> I can’t see anything in the current docs talking about data types - I
>> suspect that’s left to higher-level standards above CSV like TDP.
>>
>> cheers,
>> James Smith
>> Open Data Institute
>>
>>> On 3 Dec 2014, at 07:46, Paul Walsh <paulywalsh at gmail.com
>>> <mailto:paulywalsh at gmail.com>> wrote:
>>>
>>> Hi,
>>>
>>> I’m working on a JSON table schema validator (spec
>>> <http://dataprotocols.org/json-table-schema/>). 
>>>
>>> My original intention was to port this Node implementation
>>> <https://github.com/okfn/json-table-schema-validator> to Python, but
>>> on closer inspection, the Node module does not cover enough of the
>>> spec, so I’m no longer “porting”, but writing an implementation
>>> using that as an existing example of one.
>>>
>>> My goal is to fully cover the spec, and my primary use case right
>>> now is validating CSV files against JSON table schemas. 
>>>
>>> CSV as the data source raises issues with several of the types in
>>> the spec whose representation is object or array (object/json,
>>> array, geopoint, geojson). I’m not aware of any implementations that
>>> handle this (correct me if I’m wrong). 
>>>
>>> I see two directions:
>>>
>>> 1. Don’t try to handle these types when source is CSV (e.g.: A CSV
>>> source could not have a field that is type geopoint)
>>> 2. Have a spec that describes how implementations MAY parse a CSV
>>> field as object or array, pre-validation. Something like:
>>>     * TO_ARRAY (INTRAFIELD_SEPARATOR = '|’), e.g.: value|value|value
>>>     * TO_OBJECT (INTRAFIELD_SEPARATOR = '**', INTRAFIELD_ASSIGNMENT
>>> = '='): e.g.: key=value**key=value**key=value
>>>
>>>
>>> Any thoughts?
>>>
>>> Paul
>>> _______________________________________________
>>> okfn-labs mailing list
>>> okfn-labs at lists.okfn.org <mailto:okfn-labs at lists.okfn.org>
>>> https://lists.okfn.org/mailman/listinfo/okfn-labs
>>> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
>>
>
>
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20141203/52505d10/attachment-0004.html>


More information about the okfn-labs mailing list