[data-protocols] RFC: JSON Table Schema

Rufus Pollock rufus.pollock at okfn.org
Thu Nov 29 14:07:02 GMT 2012


Just to say I've now added a Related Work section:

http://www.dataprotocols.org/en/latest/json-table-schema.html#appendix-related-work

And both DSPL and json-stat were already in the "survey" doc on
web-oriented data formats:

http://www.dataprotocols.org/en/latest/data-formats.html

I do think DSPL is great but don't like the XML side. As you say there is
dspljson (Nick is in fact a colleague) and I know Friedrich is very partial
to this so it would be good to have his views (though I note in the Open
Spending Standard Friedrich helped put together the "basis" ended up being
GTFS rather than DSPL [1]).

Rufus

[1]: http://openspending.org/resources/standard/technical.html


On 29 November 2012 08:11, Xavier Badosa <xbadosa at gmail.com> wrote:

> Tom,
>
> As it has already mentioned in this list, there's a proposal for a JSON
> façade of the DSPL format:
>
> https://github.com/nickstenning/dspljson
>
> DSPL is actually more a CSV-based format than an XML-based (ick!) format.
> It probably makes sense because it's not an exchange/dissemination
> standard, but mostly an upload one.
>
> X.
>
>
> On Wed, Nov 28, 2012 at 9:27 PM, Tom Morris <tfmorris at gmail.com> wrote:
>
>> Another data point is Google's Dataset Publishing Language (DSPL)
>> https://developers.google.com/public-data/docs/developer_guide
>>
>> It's XML-based (ick!), but includes dataset level metadata, which can be
>> useful for provenance, in addition to the schema.
>>
>> Tom
>>
>> On Wed, Nov 28, 2012 at 2:42 PM, Xavier Badosa <xbadosa at gmail.com> wrote:
>>
>>> Hi Rufus,
>>>
>>> Your simple schema for tabular data is interesting: it's similar but
>>> more powerful than the schema used by the US Census Bureau API:
>>>
>>> http://www.census.gov/developers/
>>>
>>> It's important to notice that many times what is considered "tabular
>>> data" (in your sense: some fields that are shared by a set of individuals)
>>> could be better represented in a cube model. Take for example the Census
>>> API:
>>>
>>> [
>>>   ["P0010001","NAME","state"],
>>>   ["710231","Alaska","02"],
>>>   ["4779736","Alabama","01"],
>>>   ["2915918","Arkansas","05"],
>>>   ["6392017","Arizona","04"],
>>>   ["37253956","California","06"],
>>>   ...
>>> ]
>>>
>>> Rows in this example have an ID and this ID represents the possible
>>> values of a "variable" or "dimension" ("state" in the example). Instead of
>>> saying that this is some tabular data of indivuals (that happen to be
>>> states) with field "population" ("P0010001"), it seems more accurate to see
>>> it as a table ("table" in the statistical sense, not in the DB sense) or
>>> cube of population by state. This is a very frequent situation in
>>> statistics.
>>>
>>> To solve this special case (tabular data that is actually cubical,
>>> multidimensional) I have proposed JSON-stat
>>>
>>> http://json-stat.org/doc/
>>>
>>> Besides, the statistical community uses the SDMX standard for expressing
>>> statistics and is currently working on a JSON façade (SDMX-JSON). I'm a
>>> member of the SDMX-JSON group. JSON-stat is used in that group as a
>>> starting point.
>>>
>>> Probably we could benefit from some of your ideas.
>>>
>>>
>>> On Mon, Nov 26, 2012 at 11:34 AM, Rufus Pollock <rufus.pollock at okfn.org>wrote:
>>>
>>>> Hi All,
>>>>
>>>> I've been working on a simple schema for tabular data. The schema is
>>>> designed to be expressible in JSON.
>>>>
>>>> http://www.dataprotocols.org/en/latest/json-table-schema.html
>>>>
>>>> This is still incomplete (e.g. need to have format specified in more
>>>> detail) but I'd be very interested in any feedback or thoughts (e.g.
>>>> is this re-inventing the wheel - if so what is better?).
>>>>
>>>> Regards,
>>>>
>>>> Rufus
>>>>
>>>> ## Background
>>>>
>>>> In many ways this is just an extraction, with some refactoring, of
>>>> what was in the Simple Data Format spec:
>>>>
>>>> <http://www.dataprotocols.org/en/latest/simple-data-format.html>
>>>>
>>>> Splitting out into its own mini-RFC is good because smaller pieces are
>>>> more useful and it makes it re-usable (e.g. can be used from the data
>>>> packages spec).
>>>>
>>>> Real world use: something very like this is used in ReclineJS:
>>>> <http://reclinejs.com/docs/models.html#field> and also in the CKAN API
>>>> <http://docs.ckan.org/en/ckan-1.8/datastore-api.html
>>>>
>>>> _______________________________________________
>>>> data-protocols mailing list
>>>> data-protocols at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/data-protocols
>>>> Unsubscribe: http://lists.okfn.org/mailman/options/data-protocols
>>>>
>>>
>>>
>>> _______________________________________________
>>> data-protocols mailing list
>>> data-protocols at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/data-protocols
>>> Unsubscribe: http://lists.okfn.org/mailman/options/data-protocols
>>>
>>>
>>
>> _______________________________________________
>> data-protocols mailing list
>> data-protocols at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/data-protocols
>> Unsubscribe: http://lists.okfn.org/mailman/options/data-protocols
>>
>>
>
> _______________________________________________
> data-protocols mailing list
> data-protocols at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-protocols
> Unsubscribe: http://lists.okfn.org/mailman/options/data-protocols
>
>


-- 
Co-Founder, Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-protocols/attachments/20121129/eadf9895/attachment-0001.htm>


More information about the data-protocols mailing list