[data-protocols] Thoughts on JSON Table Schema

Alex Dean alex at snowplowanalytics.com
Mon Jul 6 20:33:21 UTC 2015


Ah - that's made it click for me, thanks Paul.

So I think really the existing schema is the "Tabular Data Schema" - in
other words, a way of schema'ing tabular data as stored in T/CSVs so that
consuming processes can validate that a given slice of tabular data
conforms to its schema. It's a way of describing tabular data, but it's not
attempting to describe any separate vessel (table) for that tabular data to
reside in. Completely makes sense.

What I want to create is the "Database Table Schema", which is a way of
expressing relational/columnar database table structures in a declarative
way. Kind of like an AST but for tables.

I think my confusion was due to a) the name of the existing schema and b)
the inclusion of primary and foreign keys, which don't feel to me like they
belong in a "Tabular Data Schema" (as they stray into modeling presumably
the previous home of the tabular data rather than intrinsic properties of
the tabular data itself).

Cheers,

Alex



On Mon, Jul 6, 2015 at 8:35 PM, Paul Walsh <paulywalsh at gmail.com> wrote:

> The main use case would be providing a schema for data in CSV files, or
> other text-based files. The schema therefore provides the metadata required
> to cast values into appropriate data types in some programming language
> that is handling the data.
>
>
> On 6 Jul 2015, at 21:47, Alex Dean <alex at snowplowanalytics.com> wrote:
>
> Hi Paul,
>
> Ah - thanks for that, super-helpful. I have created a ticket here:
>
> https://github.com/dataprotocols/dataprotocols/issues/205
>
> I think I am probably a bit blinkered coming at this from a DDL-generation
> viewpoint - what are the main use cases for a generic JSON Table Schema
> which cannot generate db-specific DDL?
>
> Cheers,
>
> Alex
>
> On Mon, Jul 6, 2015 at 7:28 PM, Paul Walsh <paulywalsh at gmail.com> wrote:
>
>> Hi Alex,
>>
>> Well, the way I see it, JSON Table Schema is trying to get to some rough
>> consensus on the minimal viable information we need to describe tabular
>> data in a way that is generally applicable to a wide range of input/output
>> formats.
>>
>> It can’t be specific, because then the specificity never ends - today
>> Redshift, tomorrow…
>>
>> However, I can see the utility of an extra document that provides
>> additional information as you describe. I think the best way to start
>> tackling this is to pitch the idea with some examples of how it may look.
>> We could do it here on the list, but the issue tracker may be better, as
>> most discussion on ideas like this occurs there:
>> https://github.com/dataprotocols/dataprotocols/issues
>>
>> Best,
>>
>> Paul
>>
>>
>>
>>
>> On 6 Jul 2015, at 21:15, Alex Dean <alex at snowplowanalytics.com> wrote:
>>
>> Thanks Paul! Any thoughts on my other, more rambling point?
>>
>> Cheers,
>>
>> Alex
>>
>> On Mon, Jul 6, 2015 at 6:56 PM, Paul Walsh <paulywalsh at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> For the quick question: a JSON Schema of JSON Table Schema is here:
>>> https://github.com/dataprotocols/schemas/blob/master/json-table-schema.json
>>>
>>> Best,
>>>
>>> Paul
>>>
>>> On 6 Jul 2015, at 20:35, Alex Dean <alex at snowplowanalytics.com> wrote:
>>>
>>> Hi,
>>>
>>> First can I say I am a long-time follower and huge fan of the
>>> dataprotocols.org project.
>>>
>>> At Snowplow we are thinking of using JSON Table Schema in our Iglu
>>> schema repository system:
>>>
>>> https://github.com/snowplow/iglu
>>>
>>> First a quick question - I couldn't find a JSON Schema for the JSON
>>> Table Schema. Has anybody written this yet?
>>>
>>> More broadly: I'm not convinced that the current unitary JSON Table
>>> Schema is a viable approach.
>>>
>>> Different relational databases have different capabilities - for
>>> example, a valid table definition for Redshift must have SORTKEY and
>>> DISTKEY, and indexes are not supported. This is distinct from Postgres DDL,
>>> which in turn is distinct from BigQuery DDL, Vertica DDL etc.
>>>
>>> For me, the value of a JSON Table Schema would be in making table DDL
>>> declarative and composable. To be useful though, it must be possible to
>>> generate valid idiomatic (i.e. database-specific) DDL from a given instance
>>> of a JSON Table Schema.
>>>
>>> Based on this, I'm leaning towards a JSON Table Schema which has
>>> database-specific flavors. I think the two options here are:
>>>
>>>    1. Create a separate definition document (in JSON Schema) for each
>>>    database that we want to support, or
>>>    2. Create a unitary JSON Table Schema which uses enums of e.g.
>>>    database-specific field-descriptor types to support differences
>>>
>>> The downside of the first option is that there is no guaranteed
>>> predictability of schema shape between different database types. The second
>>> option is a little more fiddly but probably more useful long-term.
>>>
>>> Does anybody have any thoughts on the above?
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>> --
>>> Co-founder
>>> Snowplow Analytics <http://snowplowanalytics.com/>
>>> The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
>>> +44 (0)203 589 6116
>>> @alexcrdean <https://twitter.com/alexcrdean>
>>>  _______________________________________________
>>> data-protocols mailing list
>>> data-protocols at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/data-protocols
>>> Unsubscribe: https://lists.okfn.org/mailman/options/data-protocols
>>>
>>>
>>>
>>
>>
>> --
>> Co-founder
>> Snowplow Analytics <http://snowplowanalytics.com/>
>> The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
>> +44 (0)203 589 6116
>> +44 7881 622 925
>> @alexcrdean <https://twitter.com/alexcrdean>
>>
>>
>>
>
>
> --
> Co-founder
> Snowplow Analytics <http://snowplowanalytics.com/>
> The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
> +44 (0)203 589 6116
> +44 7881 622 925
> @alexcrdean <https://twitter.com/alexcrdean>
>
>
>


-- 
Co-founder
Snowplow Analytics <http://snowplowanalytics.com/>
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean <https://twitter.com/alexcrdean>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-protocols/attachments/20150706/4dc2a10e/attachment.html>


More information about the data-protocols mailing list