[data-protocols] Simple Data Format: straw man

Francis Irving francis at flourish.org
Tue May 15 18:47:16 BST 2012


*Off the top of my head* reactions...

a) Absolutely *love* that it is backwards compatible with CSV. That's the
current format people use.

b) Separate schema seems overkill. Seriously, you need to say
something is an integer or a date explicitly? Nobody cares. It's
almost an insult to how much more you need to know to really
understand a dataset (source, quality, methodology of collection,
up-to-dateness etc.). It's almost always going to be inferable from
content, or could be gotten by hints in column headings.

c) I think I'd be happier with something where I put URLs in the
column headings and/or id field as a minimal-but-useful amount of data
linking.

But the above isn't a criticism, just what it made me think.


To get this going, I'd spend lots of time thinking and doing two
things:

1) The market side. How is support for this going to spread through
tools and dominate the world? What community (a marketing one, like
when Firefox was new :) or what organisation-with-money-and-
appropriate-leadership will make that happen.

2) The user side. Starting with CSV excellent thought about that. But
beyond that, which users will give a damn to start with? How can it be
maximally usable for that?


By way of analogy... Have a read of Yishang Wong's explanation as to
why OpenID failed.

http://www.quora.com/OpenID/What-s-wrong-with-OpenID/answer/Yishan-Wong

It would be easy to make a geeky thing in the dataset sharing world
that failed for similar (but different, of course!) category errors.

Francis

On Tue, May 15, 2012 at 09:52:34AM +0100, Rufus Pollock wrote:
> Hi,
> 
> Substantially inspired by Google's Dataset Publishing Language (DSPL),
> I've put together a specification for a "Simple Data Format":
> 
> <http://www.dataprotocols.org/en/latest/simple-data-format.html>
> 
> Feedback and comments very welcome. Some further background info below.
> 
> Regards,
> 
> Rufus
> 
> 
> ## Purpose
> 
> The format’s focus is on simplicity and web usage – that is, usage
> online with access and transmission over HTTP. In addition the format
> is focused on data that can be presented in a tabular structure and in
> making it easy to produce (and consume) this format from spreadsheets
> and relational databases.
> 
> Main difference from DSPL:
> 
> This specification owes a great deal to the excellent Dataset
> Publishing Language (DSPL) put forward by Google. The main difference
> is in using JSON instead of XML for the schema and re-using as far as
> possible the JSON-LD schema language (based on linked-data) rather
> than inventing a new type and schema structure.
> 
> ## Motivating examples
> 
> Collection of time series, e.g. YourTopia dataset
> :http://datahub.io/dataset/yourtopia-italy
> 
> OpenSpending data format: http://wiki.openspending.org/Data_Format
> 
> _______________________________________________
> data-protocols mailing list
> data-protocols at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-protocols



More information about the data-protocols mailing list