[@OKau] Standard format for publishing CSV spatial data on data.gov.au - feedback, comments?
simoncropper at fossworkflowguides.com
Tue May 5 07:49:01 UTC 2015
My first thought was really, CSV standards; hasn't that been done to
death... wouldn't it be satisfactory to just push people to one of the
many recognised formats and use standards routines / scripts to
'wrangle' them into what you require? The more you expect from the
contributor the less likely that data will be provided.
That said, I presume your request is based on the many weird and
wonderful ways that custodians store and distribute their data, so I
will assume you have done your homework and already engaged them; and
the problem is not so much the actual format but what data is bundled
with the data and the field names!
The other issue that comes to mind is that you are poking the list from
your gmail list. Is this request coming from you or is it 'official'
official? If official then a more formal call should go out to all
stakeholders, or if it has; can you point us to that page?
Finally, how are you taking feedback on your GitHub repository -- via
pull requests or via issues being raised?
The remainder of my comments are in-line...
On 05/05/15 16:09, Steve Bennett wrote:
> Hi all,
> By day, I work on National Map <http://nationalmap.gov.au>, which
> scours data.gov.au <http://data.gov.au> and other portals for open
> spatial data from state and federal open data portals (including local
> government data). It lets you choose which of these datasets to view on
> the map, primarily supporting quick assessment of value in a dataset or
> to answer basic questions.
> Now, we'd like to get a bit more systematic about harvesting CSV data.
> There are two main types:
> 1) Point data described by latitude and longitude
> 2) Administrative region described by reference to a known region such
> as a suburb, postcode, local government area (LGA), ABS statistical area
> (Arbitrary line and polygon features are out of scope for now - they're
> better published as GeoJSON in any case).
I'm confused. You talk about harvesting data and providers publishing
data. These are quite distinct activities by different people.
When you harvest YOU wrangle the data into submission using whatever
scripting language is most suitable. In this context you encourage
providers to not change their format often or dramatically.
When providers publish to a standard you need to engage with them. The
open source community is only one of many providers and unfortunately
not a major one. The bulk of the data in Australia is locked up in
Government Agencies. You need to see if they would like to agree on a
> We'd really like to have an agreed standard that data providers can
> publish to, that is both supported by National Map (and other instances
> of Terria), but is a generally good, reusable format for data that will
> be used for other applications as well (including Excel, leaflet,
> CartoDB, QGIS...)
> ["We" in this case is NICTA, but I have a strong personal interest in
> this, as an Open Knowledge volunteer...]
Note comment above about why NICTA is not making this call rather than you.
> I'm tentatively calling it "Aus-Geo-CSV". I've started writing it up here:
> It's probably best to comment here.
'Here' the @OKau maillist, or 'here' the GitHub repository?
> Things I'd particularly like to know:
> - feedback on which fields are widely in use (particularly any history
> around why that's the case)
> - feedback on any likely difficulties in following the standard
> - what else should be in there?
> Should we be encouraging .vrt files to be provided?
> Do we need to mention character encodings,
YES, YES, YES (sorry, many many hours of pain here!)
> line endings,
> quoting, etc?
> - has someone else already done this,
Debatable since you are only at the beginning.
I has seen and been involved in this being discussed, debated an
implemented at the local, state and federal level over the last 20+
years. Have you spoken with the major government geospatial data
custodians? Have you poked the OSGeo list and other lists involved in
geomatics in Australia and throughout the world?
> - are there other administrative regions that are important to support?
> (I'm obviously focusing the spec on what National Map can and will
> support, but it can certainly be broader than that.)
> To make commenting easier, I'll quote the important bits here, but
> please do read the whole thing. (It will probably have changed since I
> write this...)
The previous statement is ambiguous. Can you please clarify? Where is
'here' and aren't you asking us to comment on your proposal! Do you mean
comments should be made on the text currently available on the GitHub
repository rather on the text immediately below this comment in this email!
> In data.gov.au <http://data.gov.au>, datasets that conform to this
> standard SHOULD be tagged "aus-geo-csv". The dataset MUST NOT contain
> CSV files that do not conform (but may contain other non-CSV files).
> * Preferred field names: |lat|, |lon| [the only format currently
> * Accepted field names: |latitude|, |longitude|; |lat|, |lng|
> * Discouraged: |x|, |y|;
> * Avoid: |WKT| (single column with data in |POINT(-37.8
> 144.9)| format); |easting|,|northing
> Each MUST be a number in decimal degrees (EPSG:4326). Numbers SHOULD NOT
> be enclosed in double quotes.
Sounds dumb in this day and age but you need to get the custodian to
declare the datum used! Don't assume GDA94.
Including a EPSG field would help although from memory these lists vary
between providers (e.g. QGIS, gvSIG and ArcGIS don't have comparable lists).
> A four digit postcode.
> * Preferred field name: |au:postcode|
> * Acceptable field names: |postcode|
> * Discouraged: |poa|
> For greater precision, additional fields |suburb| and |state| MAY be
> provided. For example: |postcode| 3068, |suburb| Clifton Hill, |state| VIC.
> Local Government Area
> * Preferred field name: |au:lga|
> * Acceptable field names: |lga|
> The contents MUST be the short form of the LGA name, with no "City
> of", "Council" etc. For example: "Melbourne", "Greater Geelong".
> It SHOULD be capitalised like this.
> A separate state column MUST be provided, as LGA names are not
> unique across states. A separate |au:lga_code| column SHOULD be
> * Preferred field name: |au:lga_code|
> * Acceptable field name: |lga_code|
> This MUST be the 5 digit code described by the ABS
> For example, Brisbane is 31000. Complete lists are available here
> * Preferred field name: |au:state|
> * Acceptable field names: |state|
> * Discouraged: |ste|
> The contents MUST be the two or three-letter form of the state or
> territory ("VIC", "NT").
> /KM: Case-sensitive?/
> administrative regions
> These region types are also supported:
> * |sa4|: "Statistical area level 4
> * |sa3|: "Statistical area level 3
> * |sa2|: "Statistical area level 2
> * |sa1|: "Statistical area level 1
> (ABS) [not currently supported by Terria]
> * |ced|: "Commonwealth electoral division
> * |sed|: "State electoral division
> * |ssc|: "State suburbs
> * |cnt2|: "Two letter country codes" (ISO 3166-1 Alpha 2
> * |cnt3|: "Three letter country codes" (ISO 3166-1 Alpha 3
> okfn-au mailing list
> okfn-au at lists.okfn.org
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-au
More information about the okfn-au