[ok-scotland] Dave Stafford, Stirling Council - Scottish Government Open Data Strategy: Additional considerations - rating system to identify what processes data has been through

David Stafford staffordd at stirling.gov.uk
Fri Nov 21 14:28:43 UTC 2014


Hello Kate and thanks very much for this - just so you know, I was aware that you were referring to the Berners-Lee scale, in my excitement, I was using similar numeric representations to denote experimental data quality ratings - and of course, they are completely unrelated.
 
I actually believe though, that while the Berners-Lee scale is the absolute standard for format and usability, that we have to have, a concurrent "rating", perhaps it's to be called the "Local Government Scale" or even the "Kate Byrne Scale" (oh, I like that one!) that does what the Berners-Lee scale does for formats, for data quality, accuracy, up-to-date-ness, reliability, and trustworthyness.
 
Knowing the format alone, doesn't help us - we need to know both the format, and, the content of the content, or, the information about the information - and to me, what is the point, really, of having poor/dirty/not cleansed/half baked information formatted beautifully to a Berners-Lee "5" - it seems a real waste, if B-L "5" is so difficult to achieve, then why not make it mandatory that if you are going to go to the expense of the cost of producing data sets that are Berners-Lee "5", that you will at the same time, demand that they be at LEAST an "8" on our new "Kate Byrne Scale" (so if we have the B/L scale for formats, we now have the KB scale for data quality - and everyone is happy!).
 
To your other topic, data re-use, this is something I have championed and argued for for the past seven years, I've been doing creative, powerful data re-use projects ever since I learned how to 15 years ago, and to me, it's essential that I keep whatever standard data that I need, to hand, so that when I work, I can use known good tables and ancillary data, to inform, improve and enhance ordinary reports that I am working on - I can't imagine NOT having those libraries available.
 
Making them widely available would be absolutely brilliant, because everyone could benefit from the collective creativity of database / content experts who have taken the time to create tables of standard, useful data, for the express purpose of fuelling creative data re-use, and museums are not the only place where such re-use can take place - almost any company or organisation can benefit from this process, if you have someone who is willing to collect a few data sets of powerful information, and then infuse them and cross-breed them with other statistical or historical data, to produce utterly unique reports that would be inconceivable if it were not for the stored data sets set aside by the most savvy, creative content creators - a brilliant idea, and if we can get past the DPA and FOI concerns around this, we can unleash the awesome power of not just ancillary data sets, but also, by saving raw data used within calculated processes for reporting over a period of years, you can then go back and re-use the detailed inputs, interim data sets, and outputs of a series of similar databases across a decade, to identify worrying trends, or to identify possibly-previously-undetected revenue streams, as well as produce a wealth of interesting statistics and an overview, historically, of that data set - and that, to me, is where data re-use can get very, very interesting indeed!
 
I say bravo to re-using data, any concerns we used to have have to be outweighed by the sheer excitement of what can be DONE during the re-use of data, and I've personally enjoyed the use of various high quality data collections from time to time at various places of employment, or, better still, creating my own - so if I say to myself, "if only I had a data set that contained xyz...." - but instead of just dreaming, I then go ahead and spend a week, and I BUILD my dream database collection - then, I can benefit from that collection over and over again, and often times, I can find new projects that benefit from being cross-pollinated with the xyz or other data collections - they can be incredibly powerful creative forces, and taking data from across a decade of work, and using that to power the discovery and analysis of trends - there is not much more exciting in the world of data analysis, in my humble opinion.
 
 
Thank you again, Kate, it was really your questions that stimulated this entire debate, but I very much wanted to weigh in on both my poorly stated arguments for a data quality scale, including the apparent "mixing up" of those scales with the Berners-Lee scale (which was NOT my intention, I know the difference well, for one thing, Berners-Lee Scale exists, while our new KB Scale, is not quite with us yet....) and for the opportunity to throw in my very enthusiastic vote for data re-use, and thank you Kate for repeating Mike Ellis' comments from the museum forum - they were invaluable, and really sparked some wonderful memories for me, too.
 
This is all good - please, keep going!
 
 
Thank you
 
Dave

>>> Kate Byrne <k.byrne at ed.ac.uk> 20/11/2014 13:37 >>>
Many thanks for your kind words Dave, and for stimulating discussion. 
I'd like to drop another pebble in the pool, though I may be changing 
the subject slightly.

I don't know if other ok-scotland list members lurk on the Museums 
Computer Group list (MCG at JISCMAIL.AC.UK - lots of very innovative 
people...) but there happens to be an interesting discussion about data 
reusability going on right now. Here's a snippet from Mike Ellis:
> ...
> But: personally I think that not having collections data which is 
> _actually_part of your CMS content can be part of the "oh, that's the 
> collection, over there" problem. ...[snip]...
>
> Part of the point of ingesting the metadata into the CMS itself is 
> that you can then actually do _rich_stuff with that content, not just 
> search.
>
> So as examples, you may be..
>
> - writing a web page and want to feature a related object in the sidebar
>
> - developing an online exhibition or game where you want to add 
> related objects into the flow of the narrative. An example is here: 
> http://americanmuseum.org/about-the-museum/exhibitions/gangsters/- if 
> you scroll down you can see rows of object records, all of which are 
> selected by simply selecting them from a list: the object name, 
> description and title are all pulled in automatically (hey, COPE!)
>
> ...[other examples of data reuse]...
>
At this point the next correspondent mentioned DITA:
> Absolute best practice might be to transform the data into some sort of "CMS" standard "Canonical Data Model" ... Something like (or at least based upon)http://dita.xml.org/  maybe? Then write plugins between that standard and various CMS systems.

These kind of issues are part of what prompted my "What is data?" 
question. Museum data may be an extreme case but quite a lot of 
organisations have valuable data in "difficult" formats, that it would 
be great to be able to share publicly and exploit for public benefit. 
(Copyright, privacy, IPR and the need to make money permitting, of course.)

A couple of other tangential points:

a) Just to be clear on the "rating of data" scale - by "3*" I was 
referring to the Berners-Lee scale (http://5stardata.info/), which is 
about format rather than quality. To be 5* it has to be Linked Data RDF, 
which is a bit of an ask for "all organisations" (thanks for 
clarifications) by 2017. However, data could be 5* and still pretty 
incoherent/repetitive/dirty.

b) I agree with a lot of what Dave has said (and I am personally 
fascinated by data cleaning - really!) but I do think we have to be wary 
of the best being the enemy of the good.

Kate


-- 
Kate Byrne
School of Informatics, University of Edinburgh
http://homepages.inf.ed.ac.uk/kbyrne3/
location: http://geohash.org/gcvwr2rkb5hd
twitter: @katefbyrne

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.



******************************************************************

This email and any attachments are intended solely for the individual or 
organisation to which they are addressed and may be confidential and/or 
legally privileged.  If you have received this email in error please 
forward it to servicedesk at stirling.gsx.gov.uk and then delete it.  
Please check this email and any attachments for the presence of viruses 
as Stirling Council accepts no liability for any harm caused to the 
addressees' systems or data.  Stirling Council may monitor its email system.  
Stirling Council accepts no liability for personal emails.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ok-scotland/attachments/20141121/e104f8c6/attachment-0003.html>


More information about the ok-scotland mailing list