[data-protocols] Extending the JSON Table Schema for scientific applications

Rufus Pollock rufus.pollock at okfn.org
Tue Apr 15 16:13:42 UTC 2014


First off: welcome and great to hear from you!

On 15 April 2014 14:25, Tom Aldcroft <taldcroft at gmail.com> wrote:

> Hi -
>
> I am working on a standard for text data tables for use in science
> applications, in particular in the context of the Python astropy package (
> http://astropy.org).  This includes support for reading and writing ASCII
> tables in various formats that are common in astronomy (
> http://astropy.readthedocs.org/en/latest/io/ascii/index.html).
>
> The draft proposal I submitted baselines an approach that is very similar
> to the Tabular Data Package standard in data-protocols.  After discussion
> we are very interested in adopting the JSON Table Schema from Data
> Protocols.  See:
>
>  https://github.com/taldcroft/astropy-APEs/blob/ape6/APE6.rst
>  https://github.com/astropy/astropy-APEs/pull/7
>
> The question I have is to what extent your organization would be
> interested in extending the JSON Table Schema standard to include more
> optional elements that would be common in science applications.  As a rough
> outline, we would like to see:
>

Just to be clear are you talking about JSON Table Schema or Tabular Data
Package? As you know Tabular Data Package is basically "Data Package" +
JSON Table Schema (for describing the CSVs) + CSV (for the data)

>
> At the top level:
>
> - "schema-name": optional, name (or URL) of detailed schema which allows
> for interpretation and validation of the table and header content
>

Our instinct at the moment is to support this via "profiles":
https://github.com/dataprotocols/dataprotocols/issues/87

We're a little bit vague about how this would exactly work but the idea is
that you'd register a profile name and then it would be up to tools to do
something if they recognize a given profile (so in your case you could have
a profile "astropy" or similar and then your tools would recognize that and
do some extra validation).


> - "keywords": optional, list of keyword structures which includes {"name"
> (req'd), "value" (req'd), "unit" (optional), "description" (optional)
>

Not sure I get this. Would this be on JSON Table Schema or the Data Package
level?


> - "comments": optional, list of general string comments
>

Again at what level is this wanted and whats the planned usage. Is this
comments on particular data fields or columns?


> - "history": optional, list of records indicating processing history
> (providing provenance of the data in the file).
>

This is definitely interesting but my concern is what exactly its
interpretation would be. I note there is a proposal around adding a
"scripts" field: https://github.com/dataprotocols/dataprotocols/issues/114

BTW: I should emphasize that the Data Package spec in theory allows one to
add fields as one likes. That said, there is a preference not to
unnecessarily extend (or to get items into core spec) - and even an
argument<https://github.com/dataprotocols/dataprotocols/issues/103>
that
we should not allow extension at all ...

In the standard "fields" specification:
>
>  - "unit": optional, specifies physical unit of data (e.g m/s)
>

As you probably know there is a units spec here
http://dataprotocols.org/units/

Would definitely be possible to allow this as an optional enhancement.
Alternative would be something linked-data-y in terms of the type field -
see


>  - "dtype": optional, detailed data type which indicates a specific binary
> representation (e.g. float32, int8, uint8, float64).  This can be important
> in numerical applications and is required to round-trip a data structure to
> file and back with minimal information loss.
>

What's the exact use case for dtype beyond current type. How important is
it (nowadays) to distinguish different types of floats or ints?


> At this point I'm mostly interested in general discussion of whether it's
> worth opening a pull request to extend the JSON Table Schema in the
> direction I've outlined, with details TBD.
>

Sounds good. I also note a lot of discussion goes on in the issue tracker
at https://github.com/dataprotocols/dataprotocols/issues

Rufus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/pipermail/data-protocols/attachments/20140415/976b4111/attachment.html>


More information about the data-protocols mailing list