[data-protocols] RFC: JSON Table Schema

Martin Keegan martin.keegan at okfn.org
Thu Nov 29 16:00:01 GMT 2012


On Mon, Nov 26, 2012 at 10:34 AM, Rufus Pollock <rufus.pollock at okfn.org> wrote:

> I've been working on a simple schema for tabular data. The schema is
> designed to be expressible in JSON.
>
> http://www.dataprotocols.org/en/latest/json-table-schema.html
>
> This is still incomplete (e.g. need to have format specified in more
> detail) but I'd be very interested in any feedback or thoughts (e.g.
> is this re-inventing the wheel - if so what is better?).

it would be good to get some background on what sort of use cases you
envision for this.

One small niggle: you have made fields a linear array rather than an
associative array, such that it's syntactically possible to have two
fields with the same name: is this intentional?

The schema allows you to specify a type as "object" or "array" or
"any"; in some use cases this is tantamount to *not* having a schema
in the first place. What's the intended use of these?

More generally: in my opinion, componentisation of knowledge is an
important desideratum and would be well served by a composable type
system which allowed people to incorporate each other's schemas by
reference. This means you could create a schema which specifies that
everything has to be an object whose keys and values must be objects
conforming with two other schemas specified by two external parties;
e.g., the key must be a constituency of the Westminster parliament and
the value a political party represented therein, where you use
mySociety's database of constituencies but a list of parties
maintained by David Boothroyd; this is intended as the analogue of a
cross table with foreign key constraints in SQL terms.

Delightfully, I have a working implementation of this already.

Stepping outside the relational model, it is worth consciously ruling
in or ruling out the use of algebraic types, that is, the ability to
say that a value is one and only one of several possible subtypes. For
example, a card in a deck might be a Joker, a Club of rank <int>, a
Spade of rank <int>, a Diamond of rank <int> or a Heart of rank <int>,
without specifying a rank for the joker.

Algebraic types can be used for saying "optionally, this value is
missing", and along with arrays and schema'd objects should allow you
to model just about anything, particularly family trees.

Please also consider adding Piqi (https://github.com/alavrik/piqi) in
the related work section.

Mk



More information about the data-protocols mailing list