[data-protocols] Simple Data Format: straw man

Chris Taggart countculture at gmail.com
Tue May 15 10:18:56 BST 2012


Good stuff. Couple of observations.

There is a CSV spec, http://tools.ietf.org/html/rfc4180, although it's not
always taken as the definitive spec.

Only allowing CSV for the data (vs JSON as an option) could be a big pain
barrier for more complex datasets. There's a lot of information we don't
publish on the OpenCharities data dump because it's heavily nested (and
FWIW we serialize some of that nested info in the DB as there's no
practical benefit to normalising it into individual tables).

Conversely, denormalising large numbers of CSV files for an import can be a
real pain point particularly where you have several layers of foreign keys
(we're currently hitting this on a number of company datasets). Often
importing data from an outside source means you only want a portion of the
data, but to do this you may need, effectively, to import into a temporary
database, then do a join to get the data you want. You can see many
situations where this would be the case, which could be avoided if JSON was
allowed.

Hope this is useful feedback,

Chris



On 15 May 2012 09:52, Rufus Pollock <rufus.pollock at okfn.org> wrote:

> Hi,
>
> Substantially inspired by Google's Dataset Publishing Language (DSPL),
> I've put together a specification for a "Simple Data Format":
>
> <http://www.dataprotocols.org/en/latest/simple-data-format.html>
>
> Feedback and comments very welcome. Some further background info below.
>
> Regards,
>
> Rufus
>
>
> ## Purpose
>
> The format’s focus is on simplicity and web usage – that is, usage
> online with access and transmission over HTTP. In addition the format
> is focused on data that can be presented in a tabular structure and in
> making it easy to produce (and consume) this format from spreadsheets
> and relational databases.
>
> Main difference from DSPL:
>
> This specification owes a great deal to the excellent Dataset
> Publishing Language (DSPL) put forward by Google. The main difference
> is in using JSON instead of XML for the schema and re-using as far as
> possible the JSON-LD schema language (based on linked-data) rather
> than inventing a new type and schema structure.
>
> ## Motivating examples
>
> Collection of time series, e.g. YourTopia dataset
> :http://datahub.io/dataset/yourtopia-italy
>
> OpenSpending data format: http://wiki.openspending.org/Data_Format
>
> _______________________________________________
> data-protocols mailing list
> data-protocols at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-protocols
>



-- 
-------------------------------------------------------
OpenCorporates :: The Open Database of the Corporate World
http://opencorporates.com
OpenlyLocal :: Making Local Government More Transparent
http://openlylocal.com
Blog: http://countculture.wordpress.com
Twitter: http://twitter.com/CountCulture
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-protocols/attachments/20120515/3d342559/attachment.htm>


More information about the data-protocols mailing list