[open-science] Making use of data: tools, process and openness (was: Re: Fwd: New edition UKDA guide: Managing and Sharing Data)
rufus.pollock at okfn.org
Sun May 8 12:09:02 UTC 2011
[Changing subject to reflect change of thread direction]
Great comment John. I'd like to expand on it!
In almost all the talks (e.g. ) I give I aim, at least once, to
make the statement along the lines:
"Openness is *not* an end in itself, it's a means to an end"
The *real end* being the creation, processing and use
information/knowledge more effectively for the purpose of bettering,
in some way or other our own lives and the world around us -- be that
finding a better way to travel to work, improved understanding and
predictions of climate change, a better way to select a stock
portfolio, working out who to vote for ...
Now there are clearly a very large set of things that can contribute
to us getting better at the "creation, processing and use of
information" but I'd argue that the following are particularly
important (clearly these all interlink ...):
1. Scalability -- i.e. to deal with large amounts of material
2. (Improved) Tools, techniques and process
3. Wide access to the raw data and content
[I'd also add a fourth item but strictly it isn't a requirement but a
personal desire!: 4. To do this in a collaborative, distributed and
decentralized manner -- avoiding the centralization of information
'power' (be that in the actual control of information or in 'refining'
Now I'd argue that openness -- both of data/content and of tools -- is
really important to each of these:
1. Scalability - central, in my view, to successful 'data scaling'
will be componentization: breaking up material into maintainable
chunks (components) that can be recombined. However, without openness
recombination will rapidly become extremely hard -- if not impossible
-- as one has to clear rights with all of the different providers of
2. Tools, technique and process. Open data makes it much easier to
develop and share tools, techniques and processes for working with
3. Wider access to the material: given the vast amount of material
becoming available we're going to want as many people as possible (and
not just 'professionals') to be able to access, experiment with and
redistribute that data as easily as possible (cf. the many minds
principle: the best thing to do with your data will be though of by
To sum up then: I completely agree with you John that "doing useful
stuff" with the data is the central thing (in some sense the only
thing). We should always make clear that openness is important because
of what it helps us do not because it is an end in itself.
Furthermore, tooling, annotation, feedback loops  etc are
absolutely central (which is why more I'd estimate that much more than
50% of the Open Knowledge Foundation's time and resource is spent on
developing tools, platforms, techniques like CKAN  for working with
(open) data and content).
At the same time, as I've outlined above, I think openness is pretty
central to making significant progress "doing useful stuff". Given
this, I think it is important that we do "go on" about *open* data --
not, of course, in an obsessive ideological are-you-in/are-you-out way
but in an exhortatory, this-matters-to-what-we-can-build way. This is
especially true at the present time, when the default in most areas
still seems to be non-open.
: http://ckan.net/ and http://ckan.org/ (software)
On 6 May 2011 17:15, john wilbanks <wilbanks at creativecommons.org> wrote:
> this is a trending part of the data conversation that i am seeing worldwide
> - licenses on data are considered a tiny part of overall data management.
> the UK is the most progressive, and the US default position of public domain
> is nice (though we are continuing to press for its being in the PD globally,
> and not just domestically). but when the scientists who are not on this list
> gather to talk data with their funders, IP is rarely at the top of the
> it is of course essential, but far from sufficient, for data to be "open" -
> if it's not annotated, doesn't have provenance, doesn't have tracks of how
> it came from its original raw forms to the intermediate processed forms that
> are so much more useful, doesn't have tracks of the feedback loops that
> processed it, etc.
> i now spend much of my time these days *outside* the open science world, in
> what they call out here the "big data" world. most of the data people i talk
> to are obsessed with all of the ways data is made a) useful via tooling and
> annotation and b) social (in the sense of the feedback loops and
> conversations among big data users, not in the facebook sense) and are not
> very concerned with the openness of data in the absence of a) and b).
> we've got to get serious as a community about addressing these things
> ourselves, or we risk becoming a one-note community obsessed with whether or
> not data is "open" rather than joining the debate about how to make data
> *useful* - and making the open argument a key to that broader one.
> My .02
> On 5/6/2011 4:55 AM, Jonathan Gray wrote:
>> Surprised to see only one mention of "open data" on page 30. A shame that
>> there isn't a brief step by step guide to openly licensing data in the
>> Does anyone know any of the authors of this that we could contact - to see
>> if they'd consider putting this in next time?
>> ---------- Forwarded message ----------
>> The UK Data Archive has just published the 3rd edition of its 'Managing
>> Sharing Data - best practice for researchers' guide.
>> It is available online at: *
>> Hard copies can be requested from *Communications
>> enquiries*<comms at data-archive.ac.uk>
>> This edition contains much new content, illustrated with numerous case
>> studies. Guidance is aimed at researchers across the natural and social
>> sciences and humanities, and covers:
>> - why and how to share research data
>> - data management planning and costing
>> - documenting data
>> - formatting data
>> - storing data
>> - ethics and consent in data sharing
>> - data copyright
>> - data management strategies for large investments
>> New and updated guidance results from working closely with researchers,
>> centres and programmes within the JISC-funded Data Management Planning for
>> ESRC Research Data-rich Investments (DMP-ESRC) project.
>> The guide is published thanks to funding from the Joint Information
>> Committee (JISC), the Rural Economy and Land Use (Relu) Programme and the
>> Data Archive.
>> VAN DEN EYNDEN
>> RESEARCH DATA MANAGEMENT SUPPORT SERVICES& RELU DATA SUPPORT SERVICE
>> UK *DATA ARCHIVE*
>> UNIVERSITY OF ESSEX
>> WIVENHOE PARK
>> ESSEX, CO4 3SQ
>> *T* +44(0)1206 872234; 07768432422
>> *E* *veerle at essex.ac.uk*<http://firstname.lastname@example.org>
>> *W **www.data-archive.ac.uk*<http://www.data-archive.ac.uk/>
>> ENSURING CONTINUOUS ACCESS TO HIGH QUALITY RESEARCH DATA
>> Legal Disclaimer: Any views expressed by the sender of this message are
>> necessarily those of the UK Data Archive or the ESRC.
>> This email and any files transmitted with it are confidential and intended
>> solely for the use of the individual(s) or entity to whom they are
>> open-science mailing list
>> open-science at lists.okfn.org
> John Wilbanks
> VP for Science
> Creative Commons
> web: http://creativecommons.org/science
> blog: http://scienceblogs.com/commonknowledge
> twitter: @wilbanks
> open-science mailing list
> open-science at lists.okfn.org
Co-Founder, Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/
More information about the open-science