[open-science] Making use of data: tools, process and openness (was: Re: Fwd: New edition UKDA guide: Managing and Sharing Data)

Jonathan Gray jonathan.gray at okfn.org
Mon May 9 13:01:07 UTC 2011

John, Rufus: lots of good stuff in your emails. Wonder if either of
you fancy bashing any of these ideas into blog posts for
blog.okfn.org? Could be a lovely couplet! :-D


  * "Open data is not enough" or "Science doth not thrive upon open
data alone" (John's original post)
  * "Why openness matters" (Rufus's four point piece below)

Also think it would be great to have more public call/response on some
of the key issues that the OKF works on (another big one being: RDF -


On Sun, May 8, 2011 at 1:09 PM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> [Changing subject to reflect change of thread direction]
> Great comment John. I'd like to expand on it!
> In almost all the talks (e.g. [1]) I give I aim, at least once, to
> make the statement along the lines:
> "Openness is *not* an end in itself, it's a means to an end"
> The *real end* being the creation, processing and use
> information/knowledge more effectively for the purpose of bettering,
> in some way or other our own lives and the world around us -- be that
> finding a better way to travel to work, improved understanding and
> predictions of climate change, a better way to select a stock
> portfolio, working out who to vote for ...
> Now there are clearly a very large set of things that can contribute
> to us getting better at the "creation, processing and use of
> information" but I'd argue that the following are particularly
> important (clearly these all interlink ...):
> 1. Scalability -- i.e. to deal with large amounts of material
> 2. (Improved) Tools, techniques and process
> 3. Wide access to the raw data and content
> [I'd also add a fourth item but strictly it isn't a requirement but a
> personal desire!: 4. To do this in a collaborative, distributed and
> decentralized manner -- avoiding the centralization of information
> 'power' (be that in the actual control of information or in 'refining'
> (processing))]
> Now I'd argue that openness -- both of data/content and of tools -- is
> really important to each of these:
> 1. Scalability - central, in my view, to successful 'data scaling'
> will be componentization: breaking up material into maintainable
> chunks (components) that can be recombined. However, without openness
> recombination will rapidly become extremely hard -- if not impossible
> -- as one has to clear rights with all of the different providers of
> data.
> 2. Tools, technique and process. Open data makes it much easier to
> develop and share tools, techniques and processes for working with
> data.
> 3. Wider access to the material: given the vast amount of material
> becoming available we're going to want as many people as possible (and
> not just 'professionals') to be able to access, experiment with and
> redistribute that data as easily as possible (cf. the many minds
> principle: the best thing to do with your data will be though of by
> someone else).
> To sum up then: I completely agree with you John that "doing useful
> stuff" with the data is the central thing (in some sense the only
> thing). We should always make clear that openness is important because
> of what it helps us do not because it is an end in itself.
> Furthermore, tooling, annotation, feedback loops [2] etc are
> absolutely central (which is why more I'd estimate that much more than
> 50% of the Open Knowledge Foundation's time and resource is spent on
> developing tools, platforms, techniques like CKAN [2] for working with
> (open) data and content).
> At the same time, as I've outlined above, I think openness is pretty
> central to making significant progress "doing useful stuff". Given
> this, I think it is important that we do "go on" about *open* data --
> not, of course, in an obsessive ideological are-you-in/are-you-out way
> but in an exhortatory, this-matters-to-what-we-can-build way. This is
> especially true at the present time, when the default in most areas
> still seems to be non-open.
> Rufus
> [1]: http://m.okfn.org/files/talks/ccc_20091228/
> [2]: http://ckan.net/ and http://ckan.org/ (software)
> On 6 May 2011 17:15, john wilbanks <wilbanks at creativecommons.org> wrote:
>> this is a trending part of the data conversation that i am seeing worldwide
>> - licenses on data are considered a tiny part of overall data management.
>> the UK is the most progressive, and the US default position of public domain
>> is nice (though we are continuing to press for its being in the PD globally,
>> and not just domestically). but when the scientists who are not on this list
>> gather to talk data with their funders, IP is rarely at the top of the
>> agenda.
>> it is of course essential, but far from sufficient, for data to be "open" -
>> if it's not annotated, doesn't have provenance, doesn't have tracks of how
>> it came from its original raw forms to the intermediate processed forms that
>> are so much more useful, doesn't have tracks of the feedback loops that
>> processed it, etc.
>> i now spend much of my time these days *outside* the open science world, in
>> what they call out here the "big data" world. most of the data people i talk
>> to are obsessed with all of the ways data is made a) useful via tooling and
>> annotation and b) social (in the sense of the feedback loops and
>> conversations among big data users, not in the facebook sense) and are not
>> very concerned with the openness of data in the absence of a) and b).
>> we've got to get serious as a community about addressing these things
>> ourselves, or we risk becoming a one-note community obsessed with whether or
>> not data is "open" rather than joining the debate about how to make data
>> *useful* - and making the open argument a key to that broader one.
>> My .02
>> jtw
>> On 5/6/2011 4:55 AM, Jonathan Gray wrote:
>>> Surprised to see only one mention of "open data" on page 30. A shame that
>>> there isn't a brief step by step guide to openly licensing data in the
>>> guide!
>>> Does anyone know any of the authors of this that we could contact - to see
>>> if they'd consider putting this in next time?
>>> J.
>>> ---------- Forwarded message ----------
>>> The UK Data Archive has just published the 3rd edition of its 'Managing
>>> and
>>> Sharing Data - best practice for researchers' guide.
>>> It is available online at: *
>>> http://www.data-archive.ac.uk/media/2894/managingsharing.pdf*<http://www.data-archive.ac.uk/media/2894/managingsharing.pdf>
>>> Hard copies can be requested from *Communications
>>> enquiries*<comms at data-archive.ac.uk>
>>> .
>>> This edition contains much new content, illustrated with numerous case
>>> studies. Guidance is aimed at researchers across the natural and social
>>> sciences and humanities, and covers:
>>>     - why and how to share research data
>>>    - data management planning and costing
>>>    - documenting data
>>>    - formatting data
>>>    - storing data
>>>    - ethics and consent in data sharing
>>>    - data copyright
>>>    - data management strategies for large investments
>>> New and updated guidance results from working closely with researchers,
>>> centres and programmes within the JISC-funded Data Management Planning for
>>> ESRC Research Data-rich Investments (DMP-ESRC) project.
>>> The guide is published thanks to funding from the Joint Information
>>> Systems
>>> Committee (JISC), the Rural Economy and Land Use (Relu) Programme and the
>>> UK
>>> Data Archive.
>>> *VEERLE*
>>> ESSEX, CO4 3SQ
>>> *T* +44(0)1206 872234; 07768432422
>>> *E*  *veerle at essex.ac.uk*<http://veerle@essex.ac.uk>
>>> *W **www.data-archive.ac.uk*<http://www.data-archive.ac.uk/>
>>> *_______________________________________________________*
>>> ..........................................................................................................................................................................
>>> Legal Disclaimer:  Any views expressed by the sender of this message are
>>> not
>>> necessarily those of the UK Data Archive or the ESRC.
>>> This email and any files transmitted with it are confidential and intended
>>> solely for the use of the individual(s) or entity to whom they are
>>> addressed.
>>> ..........................................................................................................................................................................
>>> _______________________________________________
>>> open-science mailing list
>>> open-science at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/open-science
>> --
>> John Wilbanks
>> VP for Science
>> Creative Commons
>> web: http://creativecommons.org/science
>> blog: http://scienceblogs.com/commonknowledge
>> twitter: @wilbanks
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
> --
> Co-Founder, Open Knowledge Foundation
> Promoting Open Knowledge in a Digital Age
> http://www.okfn.org/ - http://blog.okfn.org/
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science

Jonathan Gray

Community Coordinator
The Open Knowledge Foundation


More information about the open-science mailing list