[data-protocols] Simple Data Format: straw man

Rufus Pollock rufus.pollock at okfn.org
Tue May 15 12:15:02 BST 2012


On 15 May 2012 10:18, Chris Taggart <countculture at gmail.com> wrote:
> Good stuff. Couple of observations.
>
> There is a CSV spec, http://tools.ietf.org/html/rfc4180, although it's not
> always taken as the definitive spec.
>
> Only allowing CSV for the data (vs JSON as an option) could be a big pain

So you'd vote +1 for JSON as an option for the data transport. I like
that too but I was trying to avoid having too much flexibility. I
probalby should do some work on the data packages spec [1] to have it
support things like a manifest or similar. Also, how do we "line
oriented" json or similar?

[1]: http://www.dataprotocols.org/en/latest/packages.html

> barrier for more complex datasets. There's a lot of information we don't
> publish on the OpenCharities data dump because it's heavily nested (and FWIW
> we serialize some of that nested info in the DB as there's no practical
> benefit to normalising it into individual tables).

Agreed.

> Conversely, denormalising large numbers of CSV files for an import can be a
> real pain point particularly where you have several layers of foreign keys
> (we're currently hitting this on a number of company datasets). Often

Understood but how could we make this better without having full SQL -
would JSON be acceptable.

> importing data from an outside source means you only want a portion of the
> data, but to do this you may need, effectively, to import into a temporary
> database, then do a join to get the data you want. You can see many
> situations where this would be the case, which could be avoided if JSON was
> allowed.

Agreed.

> Hope this is useful feedback,

Very useful :-)

rufus



More information about the data-protocols mailing list