[data-protocols] Thoughts on JSON Table Schema

Paul Walsh paulywalsh at gmail.com
Mon Jul 6 18:28:58 UTC 2015


Hi Alex,

Well, the way I see it, JSON Table Schema is trying to get to some rough consensus on the minimal viable information we need to describe tabular data in a way that is generally applicable to a wide range of input/output formats.

It can’t be specific, because then the specificity never ends - today Redshift, tomorrow…

However, I can see the utility of an extra document that provides additional information as you describe. I think the best way to start tackling this is to pitch the idea with some examples of how it may look. We could do it here on the list, but the issue tracker may be better, as most discussion on ideas like this occurs there: https://github.com/dataprotocols/dataprotocols/issues <https://github.com/dataprotocols/dataprotocols/issues>

Best,

Paul


 
> On 6 Jul 2015, at 21:15, Alex Dean <alex at snowplowanalytics.com> wrote:
> 
> Thanks Paul! Any thoughts on my other, more rambling point?
> 
> Cheers,
> 
> Alex
> 
> On Mon, Jul 6, 2015 at 6:56 PM, Paul Walsh <paulywalsh at gmail.com <mailto:paulywalsh at gmail.com>> wrote:
> Hi,
> 
> For the quick question: a JSON Schema of JSON Table Schema is here: https://github.com/dataprotocols/schemas/blob/master/json-table-schema.json <https://github.com/dataprotocols/schemas/blob/master/json-table-schema.json>
> 
> Best,
> 
> Paul
> 
>> On 6 Jul 2015, at 20:35, Alex Dean <alex at snowplowanalytics.com <mailto:alex at snowplowanalytics.com>> wrote:
>> 
>> Hi,
>> 
>> First can I say I am a long-time follower and huge fan of the dataprotocols.org <http://dataprotocols.org/> project.
>> 
>> At Snowplow we are thinking of using JSON Table Schema in our Iglu schema repository system:
>> 
>> https://github.com/snowplow/iglu <https://github.com/snowplow/iglu>
>> 
>> First a quick question - I couldn't find a JSON Schema for the JSON Table Schema. Has anybody written this yet?
>> 
>> More broadly: I'm not convinced that the current unitary JSON Table Schema is a viable approach.
>> 
>> Different relational databases have different capabilities - for example, a valid table definition for Redshift must have SORTKEY and DISTKEY, and indexes are not supported. This is distinct from Postgres DDL, which in turn is distinct from BigQuery DDL, Vertica DDL etc.
>> 
>> For me, the value of a JSON Table Schema would be in making table DDL declarative and composable. To be useful though, it must be possible to generate valid idiomatic (i.e. database-specific) DDL from a given instance of a JSON Table Schema.
>> 
>> Based on this, I'm leaning towards a JSON Table Schema which has database-specific flavors. I think the two options here are:
>> Create a separate definition document (in JSON Schema) for each database that we want to support, or
>> Create a unitary JSON Table Schema which uses enums of e.g. database-specific field-descriptor types to support differences
>> The downside of the first option is that there is no guaranteed predictability of schema shape between different database types. The second option is a little more fiddly but probably more useful long-term.
>> 
>> Does anybody have any thoughts on the above?
>> 
>> Thanks,
>> 
>> Alex
>> 
>> -- 
>> Co-founder
>> Snowplow Analytics <http://snowplowanalytics.com/>
>> The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
>> +44 (0)203 589 6116 <>
>> @alexcrdean <https://twitter.com/alexcrdean>_______________________________________________
>> data-protocols mailing list
>> data-protocols at lists.okfn.org <mailto:data-protocols at lists.okfn.org>
>> https://lists.okfn.org/mailman/listinfo/data-protocols <https://lists.okfn.org/mailman/listinfo/data-protocols>
>> Unsubscribe: https://lists.okfn.org/mailman/options/data-protocols <https://lists.okfn.org/mailman/options/data-protocols>
> 
> 
> 
> 
> -- 
> Co-founder
> Snowplow Analytics <http://snowplowanalytics.com/>
> The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
> +44 (0)203 589 6116 <>
> +44 7881 622 925 <>
> @alexcrdean <https://twitter.com/alexcrdean>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-protocols/attachments/20150706/42744882/attachment-0001.html>


More information about the data-protocols mailing list