[ok-scotland] Dave Stafford, Stirling Council - Scottish Government Open Data Strategy: Additional considerations - rating system to identify what processes data has been through

Thu Nov 20 13:37:22 UTC 2014

Many thanks for your kind words Dave, and for stimulating discussion. 
I'd like to drop another pebble in the pool, though I may be changing 
the subject slightly.

I don't know if other ok-scotland list members lurk on the Museums 
Computer Group list (MCG at JISCMAIL.AC.UK - lots of very innovative 
people...) but there happens to be an interesting discussion about data 
reusability going on right now. Here's a snippet from Mike Ellis:
> ...
> But: personally I think that not having collections data which is 
> _actually_part of your CMS content can be part of the "oh, that's the 
> collection, over there" problem. ...[snip]...
>
> Part of the point of ingesting the metadata into the CMS itself is 
> that you can then actually do _rich_stuff with that content, not just 
> search.
>
> So as examples, you may be..
>
> - writing a web page and want to feature a related object in the sidebar
>
> - developing an online exhibition or game where you want to add 
> related objects into the flow of the narrative. An example is here: 
> http://americanmuseum.org/about-the-museum/exhibitions/gangsters/- if 
> you scroll down you can see rows of object records, all of which are 
> selected by simply selecting them from a list: the object name, 
> description and title are all pulled in automatically (hey, COPE!)
>
> ...[other examples of data reuse]...
>
At this point the next correspondent mentioned DITA:
> Absolute best practice might be to transform the data into some sort of "CMS" standard "Canonical Data Model" ... Something like (or at least based upon)http://dita.xml.org/  maybe? Then write plugins between that standard and various CMS systems.

These kind of issues are part of what prompted my "What is data?" 
question. Museum data may be an extreme case but quite a lot of 
organisations have valuable data in "difficult" formats, that it would 
be great to be able to share publicly and exploit for public benefit. 
(Copyright, privacy, IPR and the need to make money permitting, of course.)

A couple of other tangential points:

a) Just to be clear on the "rating of data" scale - by "3*" I was 
referring to the Berners-Lee scale (http://5stardata.info/), which is 
about format rather than quality. To be 5* it has to be Linked Data RDF, 
which is a bit of an ask for "all organisations" (thanks for 
clarifications) by 2017. However, data could be 5* and still pretty 
incoherent/repetitive/dirty.

b) I agree with a lot of what Dave has said (and I am personally 
fascinated by data cleaning - really!) but I do think we have to be wary 
of the best being the enemy of the good.

Kate

-- 
Kate Byrne
School of Informatics, University of Edinburgh
http://homepages.inf.ed.ac.uk/kbyrne3/
location: http://geohash.org/gcvwr2rkb5hd
twitter: @katefbyrne

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.