[open-science] Times Higher Education article on opening up research data

Peter Murray-Rust pm286 at cam.ac.uk
Tue Jul 20 05:37:48 UTC 2010


I think this is a real opportunity for the OKF - although it will be a lot
of hard work.

"Open Data" is now hitting us from every side. Five years ago the term was
unknown. Many of the people who are arguing for Open Data (e.g. the THES
article, Climategate, etc.) think Open Data is trivial matter of pressing a
button and exposing your nicely ordered, recomputable data. Refusal to do so
means you are selfish and uncooperative.

It's not like that. Data are messy, sprawling, incompletely understood. They
are often created or gathered by people who don't understand data. In our
own department many people have "lost data". This isn't because they are
deliberately irresponsible (they aren't) but because the infrastructure
(technical and cultural) isn't there.

I have been trying to aspire to Open Notebook Science in computational
chemistry - where all the data are exposed at the time of calculation. It's
very difficult technically. The first time I tried it I failed. I'm trying
to work towards Reproducible Computational Chemistry. I am some way away
(though making progress).

Data processing is almost inevitable. In many projects it goes somethinig
like:
RawData => CleanedData => FilteredData
The raw data usually comes from an instrument. It's almost never absolute.
It needs adjusting for calibration, for noise and many other common
artefacts. This is not "munging the data to fit the theory", it is trying to
create an abstraction which is independent of the particular experimental
setup.

Then it needs filtering. A typical example is recognising significant
effects and separating these from noise. Of course noise is *sometimes* a
new scientific effect - e.g. pulsars - but most of the time it is
unexplained noise. We trust the domain expert to clean and filter the data.

This takes time. There's little (IMO) to be gained by exposing this, any
more than the calibration of microscopes and balances. I'm personally happy
for it to be expopsed, but not to find the time and resources to do it.

So I think the OKF is in a unique position to bring clarity and neutrality
into this. One way might be to have Panton discussions on this - with a
range of points of view. Another would be to see OKF white-papers on the
realistic and unrealistic expectations of exposing data. If these were
compelling then they could inform decision-making by funders, publishers,
etc. I liked Chris Rusbridge's comment on the FOI-isation of data. We have
to have some perspective. There has to be a timegap between conception and
exposure, just as there has to be for most other information under FOI. And
I think we'd all agree FOI is a very blunt instrument.

In summary "Data Is Difficult". The OKF is well positioned to outline those
difficulties cartefully and neutrally and provide guidelines for addressing
them

P.


On Mon, Jul 19, 2010 at 9:08 PM, Lance McKee <lmckee at opengeospatial.org>wrote:

> Re:
>
>
>> so maybe it all comes back to better ways of annotating data.
>>
> I think it all comes down to science funders requiring metadata, and
> requiring metadata that conforms to standards. Each discipline's data
> coordination body, coordinating with other data coordinating bodies, needs
> to look at ISO metadata standards, open source and proprietary metadata
> tools, standards for encoding data and standards for interfaces on software
> that produces or ingests data.
>
> For geospatial data, for example, there are the ISO 19115 and ISO 19119
> metadata standards and OGC standards, including the OGC ("OpenGIS")
> Observations and Measurements Encoding Standard (O&M) (
> http://www.opengeospatial.org/standards/om) .
>
> In a Web services world, we will discover that the distinction betwee "data
> files" and "metadata files" is an artifact of primitive 20th century
> computer technology. There should be no distinction between these, other
> than distinctions for the parts of a record, such as we have for books:
> front cover and title, copyright page, title page, table of contents,
> dedication page and acknowledgments, preface, introduction, body, footnotes,
> index, glossary, back cover.  Is such a structure too complex for today's
> digital science documents?
>
> Speak up and be bold, data curators, and coordinate like our planetary
> lease depended on it!
>
> Lance
>
> Lance McKee
> Senior Staff Writer
> Open Geospatial Consortium (OGC)
> 508-752-0108
> lmckee at opengeospatial.org
>
> The OGC: International Location Standards
> http://www.opengeospatial.org
>
>
>
> On Jul 19, 2010, at 2:07 PM, Jessy Cowan-Sharp wrote:
>
>  interesting development. one point this article raises is the question of
>> how requirements for release might impact longitudinal studies, and in
>> general makes me wonder, when does something become FoI/FOIA-able? when it's
>> pencil marks in a notebook? when the file has been saved? when a
>> statistically significant number of observations have been made?
>>
>> or put another way, thinking more about sharing that hostile release, when
>> is a data set complete enough to share? when does a set of observations
>> become a data set? are there ways to dictate "in progress" or even "stream
>> data"?
>>
>> one of the biggest arguments against data sharing in science is that those
>> who haven't been intimately involved with the project "wouldn't get it".
>> this seems like a misnomer to me, since lack of availability/exposure to raw
>> data only exacerbates our lack of literacy with it. but especially with
>> charged issues like climate change it's easy to see how sharing can
>> backfire.
>>
>> so maybe it all comes back to better ways of annotating data.
>>
>> anyway, bit of a rhetorical rant i guess, but worth thinking about.
>>
>> jessy
>>
>>
>> On Mon, Jul 19, 2010 at 1:11 PM, Jonathan Gray <jonathan.gray at okfn.org>
>> wrote:
>> Interesting...
>>
>>
>> http://www.timeshighereducation.co.uk/story.asp?sectioncode=26&storycode=412475&c=2
>>
>> --
>> Jonathan Gray
>>
>> Community Coordinator
>> The Open Knowledge Foundation
>> http://blog.okfn.org
>>
>> http://twitter.com/jwyg
>> http://identi.ca/jwyg
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>>
>>
>> --
>> Jessy Cowan-Sharp
>> http://jessykate.com
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>
>
>
> Lance McKee
> Senior Staff Writer
> Open Geospatial Consortium (OGC)
> 508-752-0108
> lmckee at opengeospatial.org
>
> The OGC: International Location Standards
> http://www.opengeospatial.org
>
>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20100720/e50e69f8/attachment-0001.html>


More information about the open-science mailing list