[@OKau] [@OKFNau] a perspective from local government

Thu Mar 26 23:16:50 UTC 2015

On Wed, Mar 25, 2015 at 8:21 PM, Rebecca Cameron <rcameron.bis at gmail.com>
wrote:

> More than happy to share my experience. I did share the business processes
>  established with Qld Gov Departments to get some consistency in open
> data.  I am an OK ambassador here in Brisbane, just haven't been able
> rectify user access issues. More than happy to travel for a meet-up if it
> helps anyone.
>

Hi Rebecca,
  Thanks so much for sharing your experience - there's still a big shortage
of public stories about the process releasing open data in Australia. We
would love to have you speak at an Open Knowledge night in Melbourne!

I've done a few brief stints with a couple of government departments and
agencies here (not the same level of depth) so, comparing notes:

 1. Publish datasets of information already published. As the data is
> already matured and the useability already known. An easy win is the
> publication of transaction level machine readable data of already published
> data such as annual report data.
>

Yeah, I  found that too. When we make a basic list of datasets that could
be released, "published" is a useful column - if the information has
already gone out once, it's a no brainer to release as data as well.
http://melbdataguru.tumblr.com/post/105500739369/first-steps-on-the-open-data-journey

> 2. Publish machine-readable forms of website info. - lists of internal and
> external service information are easy wins. And external websites will
> source this information quickly.
>

What kinds of "service information" do you mean? Like contact details?
Locations of service centres? Examples?

> By doing 1 & 2 first the risk to the Department is minimal and the
> Department can see the sky is not going to fall in by publishing on open
> data.
>

Yeah, surprisingly important. VicRoads often tells the story of how nervous
they were at first, and it was almost surprising that their first releases
of data didn't result in catastrophe.

> 4. Get your websites and intranet to start sourcing information from open
> datasets. This reduces time for everyone and creates a single source of
> truth. Once matured also get your existing systems to source from open
> data. This will reducing information management across silo systems and
> improve your data governance. It will also reduce duplication of work, data
> owners and data managers.
>

Interesting - can you give an example of this? And do you literally mean
staff should be going to an open data portal to access datasets? I've heard
anecdotal examples of people doing this, but hadn't thought of explicitly
encouraging it.

> 5. Set the standards for datasets. I set-up a whole user guide which
> anyone could follow to publish data, even if a non-data person. The guides
> covered everything from managing datasets to ensuring de-identification of
> data and how to align datasets with existing published datasets. I used
> AIHW and ABS standards where possible as this meant the data can be easily
> mashed with federal data but also the Department’s data could be layered
> over itself with interesting results.
>
>
Yes! http://opencouncildata.org
(Most of the step-by-step info needed is in data.gov.au's Open Data
Toolkit, which we should probably link to.)

> 6. Make sure you apply mapping and standardise the mapping in the extract
> codes. For example Yes/No is stored as 0,1 in most systems or Y,N. I set a
> standard of extracting for open data publication full text Yes/No, so the
> data is useable even by non-data users.
>

Yep. I find this can be somewhat tricky for non-trivial cases - trying to
find representations that are both machine readable and human readable. For
instance, some forms of ISO 8601 are easy to read (2014-03-25) but once you
add times and repeating schedules they're not.

>
>
> 7. One of the less common open data features I added to the datasets was
> the publication of the data in formats csv and xlsx. The csv was API
> enabled, while xlsx contained a notes tab. The notes tab was in a standard
> format and was published for all datasets. The notes tab defined the data,
> the dataset, the data fields etc giving the user a full context of the
> information. I know the notes tab is used frequently by non-government data
> users, increasing the audience for your information. The notes tab also
> received positive feedback from hackers at last years GovHack.
>

Interesting - so you're using .xlsx deliberately as a richer format. Would
it have been possible to extract those notes as a separate text document
(or metadata description) describing the fields? Or do you think this was a
better approach?

>
>
> 8. And perhaps the biggest tip is to publish data moving forward. Unless
> there was a historical significance to the data, all data commenced
> publication at the current point in time moving forward. Where there was a
> request for legacy data the legacy data would be published, but otherwise
> data trends from point in time extracted moving forward. This makes huge
> efficiency savings, as you don't have to align legacy systems or data
> holdings.
>

Hmm. I notice that when you talk to data consumers about department X
releasing data Y, they generally assume that:
1. It will be released, on-going, as soon as the data is created -
real-time where applicable. (In practice, there's usually a lag between
data creation and when it even arrives in department/agency servers, then
it has to be processed)
2. All historical data will be included, too.

Seems like a sensible default position though. I assume that "moving
forward" includes the current period, and isn't just a commitment to start
releasing data from the next batch?

Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-au/attachments/20150327/de1035e4/attachment-0004.html>