[@OKau] Standard format for publishing CSV spatial data on data.gov.au - feedback, comments?

Steve Bennett stevage at gmail.com
Tue May 5 10:19:14 UTC 2015

On Tue, May 5, 2015 at 5:49 PM, Simon Cropper <
simoncropper at fossworkflowguides.com> wrote:

> My first thought was really, CSV standards; hasn't that been done to
> death... wouldn't it be satisfactory to just push people to one of the many
> recognised formats and use standards routines / scripts to 'wrangle' them
> into what you require?

Great question. That's essentially the approach I've taken with
opentrees.org and openbinmap.org, although I'm also pushing a standard for
future datasets to be published in. The huge drawback to this approach is
it essentially creates a second hub. For instance, opentrees.org has an
excellent downloadable CSV file of 9 different tree inventories merged and
cleaned, with 373,000 trees - but it's not on data.gov.au, and under
current policy (government providers only), it never will be.

National Map has no plans to establish itself as a hub either (and that's
Geoscience Australia's call, not NICTA's) - it's just a gateway to datasets
in their existing homes. I can imagine a future in which there is some
additional layer in front of the various data portals to aggregate, merge,
clean, standardise etc, but there is certainly no one proposing to fund
such a thing currently.

Which means that allowing data providers to provide data in any random
format and investing effort at, say, the National Map end to make sense of
it all would place NM in a privileged position and would misrepresent the
actual quality of the data, IMHO. If we want data to be as widely useful to
as many different audiences as possible it should be relatively
standardised in a way that lots of tools can handle it.

>The more you expect from the contributor the less likely that data will be

Yes, but IMHO requirements like "name your latitude field 'lat'" are pretty
easy to meet. I'm very sensitive to the effort required on the part of any
data provider.

> That said, I presume your request is based on the many weird and wonderful
> ways that custodians store and distribute their data, so I will assume you
> have done your homework and already engaged them;

FWIW, the request for standardisation actually came from one of our data
providers, who wanted guidance on how they should publish data.

> The other issue that comes to mind is that you are poking the list from
> your gmail list. Is this request coming from you or is it 'official'
> official? If official then a more formal call should go out to all
> stakeholders, or if it has; can you point us to that page?

Well, I'm still in a kind of exploratory phase. When I have something a bit
more concrete and bureaucrat-friendly I'll certainly be collecting that
kind of feedback. (If you're questioning my use of this email list for this
discussion, that's fine, but maybe start another thread on that? I did
hesitate a bit.)

> Finally, how are you taking feedback on your GitHub repository -- via pull
> requests or via issues being raised?

I started this standard as a wiki page which actually turns out not to be a
good idea, because it doesn't have issues, and it doesn't support tables
(weirdly). So discussion by email is probably better, but you can raise an
issue in the NM issue tracker if you prefer.

> 'Here' the @OKau maillist, or 'here' the GitHub repository?

Here. Well, for now anyway.

> >     Do we need to mention character encodings,
> YES, YES, YES (sorry, many many hours of pain here!)

Yeah, I've been going through some encoding pain recently. Urgh.

> I has seen and been involved in this being discussed, debated an
> implemented at the local, state and federal level over the last 20+ years.
> Have you spoken with the major government geospatial data custodians? Have
> you poked the OSGeo list and other lists involved in geomatics in Australia
> and throughout the world?

To be honest, I'm a bit afraid what might happen if I did. I'm certainly
not equipped to attempt to "solve geospatial data for Australia", and not
trying to. The scope of this effort is relatively narrow.

> The previous statement is ambiguous. Can you please clarify? Where is
> 'here' and aren't you asking us to comment on your proposal! Do you mean
> comments should be made on the text currently available on the GitHub
> repository rather on the text immediately below this comment in this email!

Sorry, I did mean here. In the email thread.
 >Sounds dumb in this day and age but you need to get the custodian to
declare the datum used! Don't assume GDA94.

Geodesy isn't my strong point - is mandating GDA94 not an option?

> Including a EPSG field would help although from memory these lists vary
> between providers (e.g. QGIS, gvSIG and ArcGIS don't have comparable lists).

Yeah, but EPSG:4326 is surely the most standard EPSG of them all?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-au/attachments/20150505/9e88ca57/attachment-0004.html>

More information about the okfn-au mailing list