[@OKau] Standard format for publishing CSV spatial data on data.gov.au - feedback, comments?

Craig Molyneux Craig.Molyneux at spatialvision.com.au
Tue May 5 06:35:32 UTC 2015


Hi Steve,

This is great work. Something like this should have been developed decades ago.

I’m assuming from the structure you’ve indicated here that you’re considering the major continental Australian regions. I think rather than indicating ‘state’ it should probably be something like adm1, as these are first order administrative regions, and include all Australia’s continental and offshore territories, such as Coco Keeling Island, Christmas Island, Norfolk Island, etc. Likewise with lga. This would be better represented  as adm2, i.e. second order administrative regions. This is more consistent with other international datasets. The ‘Other admin regions’ you refer to are not really ‘administrative regions’, in the same sense as states, territories and lga’s, rather regions defined for statistical or other purposes and as a result are prone to change on a regular cycle. Also, if you’re going to truncate the full name of lga’s then there should be an adm2type column to delineate whether it’s a City, Shire, Regional Council, etc.

See the Global Administrative Areas Database here http://data.fao.org/map?entryId=48a5cab8-1f15-41ff-b530-673580735373&tab=about for more info.

Regards,

Craig

From: Steve Bennett [mailto:stevage at gmail.com]
Sent: Tuesday, 5 May 2015 4:10 PM
To: OKFN-AU mailing list
Subject: [@OKau] Standard format for publishing CSV spatial data on data.gov.au - feedback, comments?

Hi all,
  By day, I work on National Map<http://nationalmap.gov.au>, which scours data.gov.au<http://data.gov.au> and other portals for open spatial data from state and federal open data portals (including local government data). It lets you choose which of these datasets to view on the map, primarily supporting quick assessment of value in a dataset or to answer basic questions.

Now, we'd like to get a bit more systematic about harvesting CSV data. There are two main types:
1) Point data described by latitude and longitude
2) Administrative region described by reference to a known region such as a suburb, postcode, local government area (LGA), ABS statistical area etc.

(Arbitrary line and polygon features are out of scope for now - they're better published as GeoJSON in any case).

We'd really like to have an agreed standard that data providers can publish to, that is both supported by National Map (and other instances of Terria), but is a generally good, reusable format for data that will be used for other applications as well (including Excel, leaflet, CartoDB, QGIS...)

["We" in this case is NICTA, but I have a strong personal interest in this, as an Open Knowledge volunteer...]

I'm tentatively calling it "Aus-Geo-CSV". I've started writing it up here:

https://github.com/NICTA/nationalmap/wiki/Aus-Geo-CSV-standard-(proposed)

It's probably best to comment here.

Things I'd particularly like to know:
- feedback on which fields are widely in use (particularly any history around why that's the case)
- feedback on any likely difficulties in following the standard
- what else should be in there? Should we be encouraging .vrt files to be provided? Do we need to mention character encodings, line endings, quoting, etc?
- has someone else already done this, and better?
- are there other administrative regions that are important to support? (I'm obviously focusing the spec on what National Map can and will support, but it can certainly be broader than that.)

To make commenting easier, I'll quote the important bits here, but please do read the whole thing. (It will probably have changed since I write this...)

In data.gov.au<http://data.gov.au>, datasets that conform to this standard SHOULD be tagged "aus-geo-csv". The dataset MUST NOT contain CSV files that do not conform (but may contain other non-CSV files).

Latitude/longitude

  *   Preferred field names: lat, lon [the only format currently supported]
  *   Accepted field names: latitude, longitude; lat, lng
  *   Discouraged: x, y;
  *   Avoid: WKT (single column with data in POINT(-37.8 144.9) format); easting,northing

Each MUST be a number in decimal degrees (EPSG:4326). Numbers SHOULD NOT be enclosed in double quotes.

Postcode

A four digit postcode.

  *   Preferred field name: au:postcode
  *   Acceptable field names: postcode
  *   Discouraged: poa

For greater precision, additional fields suburb and state MAY be provided. For example: postcode 3068, suburb Clifton Hill, state VIC.

Local Government Area
By name
·         Preferred field name: au:lga
·         Acceptable field names: lga

The contents MUST be the short form of the LGA name, with no "City of", "Council" etc. For example: "Melbourne", "Greater Geelong". It SHOULD be capitalised like this.

A separate state column MUST be provided, as LGA names are not unique across states. A separate au:lga_code column SHOULD be provided.

By ID
·         Preferred field name: au:lga_code
·         Acceptable field name: lga_code

This MUST be the 5 digit code described by the ABS<http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1270.0.55.003July%202011>. For example, Brisbane is 31000. Complete lists are available here<http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1270.0.55.003July%202011>.

State
·         Preferred field name: au:state
·         Acceptable field names: state
·         Discouraged: ste

The contents MUST be the two or three-letter form of the state or territory ("VIC", "NT").

KM: Case-sensitive?

Other administrative regions

These region types are also supported:

·         sa4: "Statistical area level 4<http://www.abs.gov.au/ausstats/abs@.nsf/0/B01A5912123E8D2BCA257801000C64F2>" (ABS)
·         sa3: "Statistical area level 3<http://www.abs.gov.au/ausstats/abs@.nsf/Latestproducts/E7369D1FCE596315CA257801000C64E5>" (ABS)
·         sa2: "Statistical area level 2<http://www.abs.gov.au/ausstats/abs@.nsf/Latestproducts/88F6A0EDEB8879C0CA257801000C64D9>" (ABS)
·         sa1: "Statistical area level 1<http://www.abs.gov.au/ausstats/abs@.nsf/0/7CAFD05E79EB6F81CA257801000C64CD>" (ABS) [not currently supported by Terria]
·         ced: "Commonwealth electoral division<http://www.abs.gov.au/ausstats/abs@.nsf/0/9C8331F55896F9C5CA2578D40012CF99?opendocument>" (ABS)
·         sed: "State electoral division<http://www.abs.gov.au/ausstats/abs@.nsf/Latestproducts/94496C7EA68A1522CA2578D40012CFB8>" (ABS)
·         ssc: "State suburbs<http://www.abs.gov.au/AUSSTATS/abs@.nsf/Previousproducts/2C6132C0B332C336CA2578D40012CF76>" (ABS)
·         cnt2: "Two letter country codes" (ISO 3166-1 Alpha 2<https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2>)
·         cnt3: "Three letter country codes" (ISO 3166-1 Alpha 3<https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3>)
For our latest news visit http://www.spatialvision.com.au This e-mail is privileged and confidential and intended only for the addressee(s). If received in error advise Spatial Vision by return e-mail and then please delete. Please consider the environment before printing this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-au/attachments/20150505/82f03340/attachment-0004.html>


More information about the okfn-au mailing list