[@OKau] Standard format for publishing CSV spatial data on data.gov.au - feedback, comments?

Steve Bennett stevage at gmail.com
Mon May 11 13:27:39 UTC 2015

Hi all,
  Thought I should follow up. After a bit more thought, I realised that
it's essentially impossible to have a standard that requires a field to
have a name which is simultaneously human readable, unambiguous in a
variety of contexts, and intuitive to both the data producer and consumer.
I've slightly re-pitched the document, backing away from a "standard" and
making it a "convention", more explicitly geared toward the primary
purpose, which is support in Terria. Obviously I hope the convention is
useful for other purposes as well, but I think there's a bigger problem
that needs a bigger solution, which I hope Open Knowledge's Tabular Data
Format <http://dataprotocols.org/tabular-data-package/> will handle.

A couple of specific changes:
- moved from the "made up out of the blue" field name "au:state" to
"state", with the Australian interpretation made unambiguous by the fact
that you tagged the dataset with aus-geo-csv. (And similar for many others)
- made the CSV spec more explicit
- recommended EPSG:4283 instead of EPSG:4326 (turns out the difference is
pretty tiny, but will grow over time due to continental drift...)
- discouraged use of LGA names as a lookup mechanism, in favour of LGA codes

Overall I think it's much better than it was, although not perfect. The
feedback was really useful.


On Wed, May 6, 2015 at 3:12 PM, Steve Bennett <stevage at gmail.com> wrote:

> On Wed, May 6, 2015 at 2:33 PM, Craig Thomler <craig.thomler at gmail.com>
> wrote:
>> My questions and concerns are not about the specific fields you include,
>> but around the need for another standard.
> Sure. Maybe the word "standard" isn't quite right - perhaps "convention"
> would be better. Virtually any time people exchange data through a format
> with definable attributes (like an Excel spreadsheet, a geoJSON file etc)
> there's a fair chance they'll up with conventions about how to name and use
> those attributes. So if the open data community doesn't want another
> "standard", we'll just publish this as a "guideline for contributors to
> National Map".
>> Can you explain what problems this standard will solve and why no
>> existing standards are not able to solve them?
> It will provide concrete guidance to data providers who want to know the
> best way to name a field containing a postcode, or a latitude and
> longitude, or an LGA.
> It will, if followed, simplify the process of using data known to follow
> the standard, by removing the guess work.
> It's a small step towards frictionless data <http://data.okfn.org/> and
> linked open data, where data can flow between different tools and
> applications with a minimum of additional context.
> If there are existing standards that do this, I'd love to hear about them.
> Did you consider extending an existing standard to solve these problems
>> before reinventing the wheel by creating another new standard (ANS)?
> Isn't formalising conventions as applied to CSV files an example of
> "extending an existing standard"?
> Does your standard make it harder or impossible for a data provider to
>> meet another standard? Particularly one that they are required to meet on a
>> mandatory basis for legal or contractual reasons?
> Good question. I'm not proposing any prohibition on other fields, so this
> standard would only conflict with another if the other proposed some
> different contents for the same field, or if the other standard prohibited
> fields included here. In the worst case, the result would probably be no
> worse than a data provider having to publish two different CSVs to meet the
> two different sets of requirements.
>> What futureproofing is being built into this standard to stop the next
>> person working at National Maps, or elsewhere in government, from throwing
>> it out and creating another new standard?
> None.
>> Have you calculated the cost for data providers to meet your standard?
>> What benefits will they get to offset these costs?
>> What incentives are there for data providers to follow your standard?
>> Will you compensate them for the cost of meeting it?
> The additional cost of renaming fields to meet the standard is likely to
> be pretty trivial for most data custodians, I believe.  I mean, look at it.
> All it says is "call your state column this, and call your LGA column
> that". We've probably all experienced the horrors of being forced to comply
> with some ghastly 1000 page spec, but that's a very different kettle of
> fish.
> The short term benefit is: "your data automatically appears on National
> Map". If that benefit is not enough to justify the cost of meeting it, then
> the data provider obviously shouldn't do it.
>> Who will endorse this standard as an actual standard? Will be be an
>> industry or government-backed standard, or just a one-man/agency
>> pseudo-standard?
> Let's start out with "pseudo-standard". Maybe it will become some kind of
> de facto community standard that people use because it does something
> useful for them. Something like https://github.com/mapbox/simplestyle-spec
> Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-au/attachments/20150511/4047ae63/attachment-0004.html>

More information about the okfn-au mailing list