[data-protocols] Simple Data Format: straw man

Nick Stenning nick at whiteink.com
Tue May 15 12:24:58 BST 2012


> On Tuesday, May 15, 2012 11:18:56 AM Chris Taggart <countculture at gmail.com> wrote:
>
> Only allowing CSV for the data ... because it's heavily nested (... there's no practical
> benefit to normalising it into individual tables).

Not sure I follow this. Either you normalise it into individual tables
(not terribly easy but a must for key concepts) or it's denormalised
in either JSON or CSV format. Is there really a problem with
converting between

    { "category": { "id": 4, "name": "foobar" } }

and

    category.id,category.name
    4,foobar

?

Anyway, on this note and in this vein, I've also put together the
start of a DSPL-inspired data description format, although mine is not
so much "inspired" by DSPL as lifted wholesale, with the main
"features" being:

1) A JSON schema. Parsing anything else on the client side is a
nightmare, and I'd like to extend the OpenSpending model editor to be
able to create these dataset-description schemas.

2) I don't care about reusability of *concepts* across datasets.
Semantic web be damned, if dataset owners have to spend two days
working out which namespace to use to describe their data, they won't
do it.

At the moment it's called DSPL JSON: see
https://github.com/nickstenning/dspljson. Comments and criticisms of
course welcomed on this too!

And now a few comments on SDF:

> The format’s focus is on simplicity and web usage – that is, usage
> online with access and transmission over HTTP.

I think you should probably stress this differently. I assume you mean
usage by the client-side (i.e. Javascript parseability) rather than
transport over HTTP: there are no problems with transmitting gzipped
XLS over HTTP, but it's not a very useful data format for what we're
talking about.

Other things that occur to me on reading the SDF web page:

1) You stress that it's inspired by DSPL, but it doesn't actually
appear to share a great deal with DSPL's data model. Where are the
distinctions between tables and slices? Can I create a hierarchy of
dimensions using something like DSPL's "topics"? Is there any support
for column mapping like DSPL's slice tablerefs? I'm not suggesting you
should add all these features, I'm just not sure what you've actually
taken from DSPL other than being a format for describing a dataset.

2) The choice you've made that causes me most concern is to have a
schema file per data file. That makes actually consuming one of these
datasets substantially more difficult for a client-side application.
How do I know which schema files exist to start with? Will I need to
create my own "index.json"? If so, you should specify the format of
that file too.

3) As suggested earlier, I'm deeply dubious of the linked data aspects
of DSPL. You've gone for compatibility with JSON-LD, which steals the
"@type" attribute from you, reducing the piece of information that
*really* matters (@simpletype, simpletype, or simple_type, depending
where in the documentation you look) to a second-class citizen. Will
anyone really implement support for the @type field?

I think this is a really important problem to solve, hence my toying
with dspljson. I think that DSPL really gets a lot of this right, with
two exceptions: XML, and Linked Data (which is an optional part of
DSPL anyway).

Anyway, SDF is a good talking point, and I hope some of the comments
above are of interest.

-N



More information about the data-protocols mailing list