[data-protocols] Thoughts on JSON Table Schema
Paul Walsh
paulywalsh at gmail.com
Mon Jul 6 19:35:06 UTC 2015
The main use case would be providing a schema for data in CSV files, or other text-based files. The schema therefore provides the metadata required to cast values into appropriate data types in some programming language that is handling the data.
> On 6 Jul 2015, at 21:47, Alex Dean <alex at snowplowanalytics.com> wrote:
>
> Hi Paul,
>
> Ah - thanks for that, super-helpful. I have created a ticket here:
>
> https://github.com/dataprotocols/dataprotocols/issues/205 <https://github.com/dataprotocols/dataprotocols/issues/205>
>
> I think I am probably a bit blinkered coming at this from a DDL-generation viewpoint - what are the main use cases for a generic JSON Table Schema which cannot generate db-specific DDL?
>
> Cheers,
>
> Alex
>
> On Mon, Jul 6, 2015 at 7:28 PM, Paul Walsh <paulywalsh at gmail.com <mailto:paulywalsh at gmail.com>> wrote:
> Hi Alex,
>
> Well, the way I see it, JSON Table Schema is trying to get to some rough consensus on the minimal viable information we need to describe tabular data in a way that is generally applicable to a wide range of input/output formats.
>
> It can’t be specific, because then the specificity never ends - today Redshift, tomorrow…
>
> However, I can see the utility of an extra document that provides additional information as you describe. I think the best way to start tackling this is to pitch the idea with some examples of how it may look. We could do it here on the list, but the issue tracker may be better, as most discussion on ideas like this occurs there: https://github.com/dataprotocols/dataprotocols/issues <https://github.com/dataprotocols/dataprotocols/issues>
>
> Best,
>
> Paul
>
>
>
>> On 6 Jul 2015, at 21:15, Alex Dean <alex at snowplowanalytics.com <mailto:alex at snowplowanalytics.com>> wrote:
>>
>> Thanks Paul! Any thoughts on my other, more rambling point?
>>
>> Cheers,
>>
>> Alex
>>
>> On Mon, Jul 6, 2015 at 6:56 PM, Paul Walsh <paulywalsh at gmail.com <mailto:paulywalsh at gmail.com>> wrote:
>> Hi,
>>
>> For the quick question: a JSON Schema of JSON Table Schema is here: https://github.com/dataprotocols/schemas/blob/master/json-table-schema.json <https://github.com/dataprotocols/schemas/blob/master/json-table-schema.json>
>>
>> Best,
>>
>> Paul
>>
>>> On 6 Jul 2015, at 20:35, Alex Dean <alex at snowplowanalytics.com <mailto:alex at snowplowanalytics.com>> wrote:
>>>
>>> Hi,
>>>
>>> First can I say I am a long-time follower and huge fan of the dataprotocols.org <http://dataprotocols.org/> project.
>>>
>>> At Snowplow we are thinking of using JSON Table Schema in our Iglu schema repository system:
>>>
>>> https://github.com/snowplow/iglu <https://github.com/snowplow/iglu>
>>>
>>> First a quick question - I couldn't find a JSON Schema for the JSON Table Schema. Has anybody written this yet?
>>>
>>> More broadly: I'm not convinced that the current unitary JSON Table Schema is a viable approach.
>>>
>>> Different relational databases have different capabilities - for example, a valid table definition for Redshift must have SORTKEY and DISTKEY, and indexes are not supported. This is distinct from Postgres DDL, which in turn is distinct from BigQuery DDL, Vertica DDL etc.
>>>
>>> For me, the value of a JSON Table Schema would be in making table DDL declarative and composable. To be useful though, it must be possible to generate valid idiomatic (i.e. database-specific) DDL from a given instance of a JSON Table Schema.
>>>
>>> Based on this, I'm leaning towards a JSON Table Schema which has database-specific flavors. I think the two options here are:
>>> Create a separate definition document (in JSON Schema) for each database that we want to support, or
>>> Create a unitary JSON Table Schema which uses enums of e.g. database-specific field-descriptor types to support differences
>>> The downside of the first option is that there is no guaranteed predictability of schema shape between different database types. The second option is a little more fiddly but probably more useful long-term.
>>>
>>> Does anybody have any thoughts on the above?
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>> --
>>> Co-founder
>>> Snowplow Analytics <http://snowplowanalytics.com/>
>>> The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
>>> +44 (0)203 589 6116 <>
>>> @alexcrdean <https://twitter.com/alexcrdean>_______________________________________________
>>> data-protocols mailing list
>>> data-protocols at lists.okfn.org <mailto:data-protocols at lists.okfn.org>
>>> https://lists.okfn.org/mailman/listinfo/data-protocols <https://lists.okfn.org/mailman/listinfo/data-protocols>
>>> Unsubscribe: https://lists.okfn.org/mailman/options/data-protocols <https://lists.okfn.org/mailman/options/data-protocols>
>>
>>
>>
>>
>> --
>> Co-founder
>> Snowplow Analytics <http://snowplowanalytics.com/>
>> The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
>> +44 (0)203 589 6116 <>
>> +44 7881 622 925 <>
>> @alexcrdean <https://twitter.com/alexcrdean>
>
>
>
> --
> Co-founder
> Snowplow Analytics <http://snowplowanalytics.com/>
> The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
> +44 (0)203 589 6116 <>
> +44 7881 622 925 <>
> @alexcrdean <https://twitter.com/alexcrdean>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-protocols/attachments/20150706/b7e8ac34/attachment-0001.html>
More information about the data-protocols
mailing list