[@OKau] Standard format for publishing CSV spatial data on data.gov.au - feedback, comments?

Simon Cropper simoncropper at fossworkflowguides.com
Tue May 5 13:50:56 UTC 2015


Hi Steve,

Responses in-line

On 05/05/15 20:19, Steve Bennett wrote:
> On Tue, May 5, 2015 at 5:49 PM, Simon Cropper
> <simoncropper at fossworkflowguides.com
> <mailto:simoncropper at fossworkflowguides.com>> wrote:
>
>     My first thought was really, CSV standards; hasn't that been done to
>     death... wouldn't it be satisfactory to just push people to one of
>     the many recognised formats and use standards routines / scripts to
>     'wrangle' them into what you require?
>
>
> Great question. That's essentially the approach I've taken with
> opentrees.org <http://opentrees.org> and openbinmap.org
> <http://openbinmap.org>, although I'm also pushing a standard for future
> datasets to be published in. The huge drawback to this approach is it
> essentially creates a second hub. For instance, opentrees.org
> <http://opentrees.org> has an excellent downloadable CSV file of 9
> different tree inventories merged and cleaned, with 373,000 trees - but
> it's not on data.gov.au <http://data.gov.au>, and under current policy
> (government providers only), it never will be.
>
> National Map has no plans to establish itself as a hub either (and
> that's Geoscience Australia's call, not NICTA's) - it's just a gateway
> to datasets in their existing homes. I can imagine a future in which
> there is some additional layer in front of the various data portals to
> aggregate, merge, clean, standardise etc, but there is certainly no one
> proposing to fund such a thing currently.
>
> Which means that allowing data providers to provide data in any random
> format and investing effort at, say, the National Map end to make sense
> of it all would place NM in a privileged position and would misrepresent
> the actual quality of the data, IMHO. If we want data to be as widely
> useful to as many different audiences as possible it should be
> relatively standardised in a way that lots of tools can handle it.
>
>  >The more you expect from the contributor the less likely that data
> will be provided.
>
> Yes, but IMHO requirements like "name your latitude field 'lat'" are
> pretty easy to meet. I'm very sensitive to the effort required on the
> part of any data provider.
>
>     That said, I presume your request is based on the many weird and
>     wonderful ways that custodians store and distribute their data, so I
>     will assume you have done your homework and already engaged them;
>
>
> FWIW, the request for standardisation actually came from one of our data
> providers, who wanted guidance on how they should publish data.
>
>     The other issue that comes to mind is that you are poking the list
>     from your gmail list. Is this request coming from you or is it
>     'official' official? If official then a more formal call should go
>     out to all stakeholders, or if it has; can you point us to that page?
>
>
> Well, I'm still in a kind of exploratory phase. When I have something a
> bit more concrete and bureaucrat-friendly I'll certainly be collecting
> that kind of feedback. (If you're questioning my use of this email list
> for this discussion, that's fine, but maybe start another thread on
> that? I did hesitate a bit.)

Not questioning the posting of the question on the list. Just want to 
know who is asking and how any feedback will be used.

>
>     Finally, how are you taking feedback on your GitHub repository --
>     via pull requests or via issues being raised?
>
>
> I started this standard as a wiki page which actually turns out not to
> be a good idea, because it doesn't have issues, and it doesn't support
> tables (weirdly). So discussion by email is probably better, but you can
> raise an issue in the NM issue tracker if you prefer.
>
>     'Here' the @OKau maillist, or 'here' the GitHub repository?
>
>
> Here. Well, for now anyway.
>
>     >     Do we need to mention character encodings,
>
>     YES, YES, YES (sorry, many many hours of pain here!)
>
>
> Yeah, I've been going through some encoding pain recently. Urgh.
>
>     I has seen and been involved in this being discussed, debated an
>     implemented at the local, state and federal level over the last 20+
>     years. Have you spoken with the major government geospatial data
>     custodians? Have you poked the OSGeo list and other lists involved
>     in geomatics in Australia and throughout the world?
>
>
> To be honest, I'm a bit afraid what might happen if I did. I'm certainly
> not equipped to attempt to "solve geospatial data for Australia", and
> not trying to. The scope of this effort is relatively narrow.

That said, these lists have people in government that have access to 
their own metadata and data dictionary standards and can point you to 
international standards.

>     The previous statement is ambiguous. Can you please clarify? Where
>     is 'here' and aren't you asking us to comment on your proposal! Do
>     you mean comments should be made on the text currently available on
>     the GitHub repository rather on the text immediately below this
>     comment in this email!
>
>
> Sorry, I did mean here. In the email thread.
>   >Sounds dumb in this day and age but you need to get the custodian to
> declare the datum used! Don't assume GDA94.
>
> Geodesy isn't my strong point - is mandating GDA94 not an option?

You could but not all people have the ability to convert old data. The 
more you expect the custodian to massage the transform and convert the 
data the less likely the data will be released.

In all the years I have been involved in munging and wrangling data on 
flora and fauna in Victoria and Australia; I know without doubt I would 
have never managed to get access to any data unless I took it in the 
format it was provided and value added by conducting the transformation, 
cleansing and standardization required.

>
>
>     Including a EPSG field would help although from memory these lists
>     vary between providers (e.g. QGIS, gvSIG and ArcGIS don't have
>     comparable lists).
>
>
> Yeah, but EPSG:4326 is surely the most standard EPSG of them all?

Steve, we are in Australia! If you fall back to anything then it should 
be the Australia Standard --- GDA94 (EPSG 4283). I would expect 
GeoSciences Australia would absolutely mandate this!

> Steve
>
>
>
> _______________________________________________
> okfn-au mailing list
> okfn-au at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-au
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-au
>


-- 
Cheers Simon

    Simon Cropper - Open Content Creator

    Free and Open Source Software Workflow Guides
    ------------------------------------------------------------
    Introduction               http://www.fossworkflowguides.com
    GIS Packages           http://www.fossworkflowguides.com/gis
    bash / Python    http://www.fossworkflowguides.com/scripting



More information about the okfn-au mailing list