[okfn-labs] Data Encoding - Simple Data Format

Rufus Pollock rufus.pollock at okfn.org
Mon May 6 16:09:06 UTC 2013


On 6 May 2013 14:58, Jeffrey Allen <Jeffrey.Allen at utsouthwestern.edu> wrote:

>  All,
>
> In developing the R Client for JSON Table Schemas (
> https://github.com/QBRC/RODProt) we had made some assumptions about the
> data encoding that are turning out to be non-standard. The most significant
> of which is our use of JSON instead of CSV to encode the data
>
There is a bit of an overview of how these "standards" fit together here
http://data.okfn.org/standards but it could do with more. Let me re-cap a
bit:

*Data Packages: *entirely payload agnostic - you can have any kind of data
from JSON, CSV, shapefile, excel - absolutely anything. Data Package
resources are supposed to have a format (or mimetype) field which tells you
what format a given piece of data.

*JSON Table Schema: *not for transporting data at all. Just a simple
JSON-based schema description oriented to "tabular" data (or, at least,
data presentable as rows/records). Data Package *can* use JSON Table Schema
in its schema attribute when describing a resource (data file) - see this
part of the spec<http://www.dataprotocols.org/en/latest/data-packages.html#tabular-data>
.

*Simple Data Format:* a Data Package where payload is "good" CSV. The
problem with Data Package is that it can be any kind of data which limits
the integration and tooling. By specifying data as CSV (and "good" CSV) SDF
ensures that we can do that kind of tooling. I note that all the data on
http://data.okfn.org/data is currently simple data format.

Comments:

- at present data should be outside the datapackage.json file. I note
however that this
issue<https://github.com/dataprotocols/dataprotocols/issues/36>
proposes
allowing inlining data (in the way you seem to be doing).

- Here's an example data package which has json data (though topojson in
this case - so not row like): https://github.com/datasets/ex-topojson

- CSV in the Simple Data format is broadly interpreted and should really be
described as "DSV" (i.e. delimiter separated variables) in that we
specifically allow other delimiters:
http://www.dataprotocols.org/en/latest/simple-data-format.html#csv-format

> Of course, we can enable the R client to digest both JSON and CSV data.
> But in thinking about how we might want to get CSV support into RODProt,
> I’m realizing that it may make sense, from the client perspective, to be
> encoding-agnostic. I think the client code can be modularized such that we
> can design a “Data Reader” interface and provide default readers for CSV
> and JSON, but it seems to be that JSON Table Schemas (/datapackages) would
> be just as apt at structuring data encoded in tab-delimited, XLS, etc. I
> don’t see any reason why we’d limit RODProt to just one encoding, in that
> case
>
So from the above:

- For cases where you just have any data you are really talking about Data
Packages (i hope the distinction between data package and json table schema
is now clear)
- There's no reason why you shouldn't be supporting Data Packages with a
whole bunch of different data types
- However supporting Simple Data Format explicitly would be nice (even if
amongst other items)

> I’m having a hard time envisioning where that fits within the “Open
> Knowledge Labs” ecosystem, however. Is that a separate spec that uses JTS?
> An extension to Simple Data Format? Something to be integrated into SDF? If
> a separate spec, is that something that you’d want to brand as part of Open
> Knowledge Labs, or would you rather someone else take ownership?
>
The separate spec I think you are talking about is just plain old Data
Packages - though your "DataReader" may only support some kinds of data
files. I'd hope prominent among those data files would be "CSV" and hence
you'd immediately have support for Simple Data Format.

Summary: it would be great to have an R lib with support for Data Packages
and esp Data Packages with CSV data (= Simple Data Format) - as per
https://github.com/okfn/data.okfn.org/issues/23

If that R lib also supports Data Packages with JSON data or any other type
thats even better!

> I’m just not sure how to proceed on that or where to document the formats
> that we’ll support in RODProt moving forward and thought you might have
> some ideas.
>
Hope the above has been useful.

Regards,

Rufus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130506/9e977e08/attachment-0002.html>


More information about the okfn-labs mailing list