[open-science] Times Higher Education article on opening up research data

Peter Murray-Rust pm286 at cam.ac.uk
Tue Jul 20 15:32:15 UTC 2010


This is great and might need a separate thread

On Tue, Jul 20, 2010 at 1:47 PM, Lance McKee <lmckee at opengeospatial.org>wrote:

> Peter,
>
> Thank you for your reply. It's certainly true in the geospatial world that
> "data is difficult".
>
> I would be very interested to know what you think of the OGC Observations
> and Measurements Encoding Standard (
> http://www.opengeospatial.org/standards/om ). It defines an abstract model
> and an XML schema [www.w3.org/XML/Schema] encoding for observations and it
> provides support for common sampling strategies. O&M also provides a general
> framework for systems that deal in technical measurements in science and
> engineering. This is one of the OGC Sensor Web Enablement (SWE) [
> http://www.opengeospatial.org/ogc/markets-technologies/swe] suite of
> standards, which have applicability outside the geospatial domain.
>

Having developed Chemical Markup Language I am fully aware of the problems.
I have spoken with OGC people from time to time. XML is useful for some
parts, but tends to enforce a rigid approach.

I am keen on a microformat-like approach to schemas unless the community
knows very clearly what is doing and agrees.

>
> I am a writer and promoter of ideas, not a scientist or technical expert,
> but I understand the issues well enough to suggest with conviction that the
> OGC provides both models of deliberative process and models of technical
> approaches that the Open Data community could usefully adapt as planks in
> their platform. (OGC standards and participation have proven useful to
> OneGeology http://www.onegeology.org/ and efforts in ocean observation,
> meteorology & climate, and hydrology.) Demonstrating solutions to the
> technical "data publish/discover/assess/access/use" problems -- "See? It
> works!" -- puts pressure on scientists and the institutions of science to
> collaboratively develop new policies and practices.
>

I think each domain is different. In geosscience you have funded projects
bridging many institutions and there is a requirement to use common
approaches. There is the problem than vendors try to create lockin.

In chemistry we have a Holy Roman Empire of feudal fighting commercial and
proprietary companies and pseudo-companies (Am. Chem. Soc.) is the worst.
The wish to won and sell data and control metadata. Result: the technology
is years behind where it should be. We have set up the BlueObelisk as a
centre for chemical open source http://www.blueobelisk.org . We have no
support except ourselves but we now produce the best software and will
inexorably win through that.

>
> As you note, the term "Open Data" is going mainstream, and there is
> considerable confusion about what it means. The OGC began as "Open GIS
> Consortium" and then renamed itself "Open Geospatial Consortium" and we are
> constantly educating audiences about the differences between "open
> standards" and "open source", encodings and data models, consensus-derived
> open interfaces and proprietary open interfaces, etc. Considerable thought
> and effort goes into "positioning" (as marketers say) one's organization and
> the organization's mission and products. It's all part of shaping the human
> world.
>
>
> Lance
>
> Lance McKee
> Senior Staff Writer
> Open Geospatial Consortium (OGC)
> 508-752-0108
> lmckee at opengeospatial.org
>
> The OGC: International Location Standards
> http://www.opengeospatial.org
>
>
> FOR OKF:
There could be a lot of value in summarising the main Opne efforts in each
domain so people could review the problems. I've had another mail today
asking about data mining and I would love to be able to present an OKF
position. If I can, I will start another thread.


>
>
> On Jul 20, 2010, at 1:37 AM, Peter Murray-Rust wrote:
>
>  I think this is a real opportunity for the OKF - although it will be a lot
>> of hard work.
>>
>> "Open Data" is now hitting us from every side. Five years ago the term was
>> unknown. Many of the people who are arguing for Open Data (e.g. the THES
>> article, Climategate, etc.) think Open Data is trivial matter of pressing a
>> button and exposing your nicely ordered, recomputable data. Refusal to do so
>> means you are selfish and uncooperative.
>>
>> It's not like that. Data are messy, sprawling, incompletely understood.
>> They are often created or gathered by people who don't understand data. In
>> our own department many people have "lost data". This isn't because they are
>> deliberately irresponsible (they aren't) but because the infrastructure
>> (technical and cultural) isn't there.
>>
>> I have been trying to aspire to Open Notebook Science in computational
>> chemistry - where all the data are exposed at the time of calculation. It's
>> very difficult technically. The first time I tried it I failed. I'm trying
>> to work towards Reproducible Computational Chemistry. I am some way away
>> (though making progress).
>>
>> Data processing is almost inevitable. In many projects it goes somethinig
>> like:
>> RawData => CleanedData => FilteredData
>> The raw data usually comes from an instrument. It's almost never absolute.
>> It needs adjusting for calibration, for noise and many other common
>> artefacts. This is not "munging the data to fit the theory", it is trying to
>> create an abstraction which is independent of the particular experimental
>> setup.
>>
>> Then it needs filtering. A typical example is recognising significant
>> effects and separating these from noise. Of course noise is *sometimes* a
>> new scientific effect - e.g. pulsars - but most of the time it is
>> unexplained noise. We trust the domain expert to clean and filter the data.
>>
>> This takes time. There's little (IMO) to be gained by exposing this, any
>> more than the calibration of microscopes and balances. I'm personally happy
>> for it to be expopsed, but not to find the time and resources to do it.
>>
>> So I think the OKF is in a unique position to bring clarity and neutrality
>> into this. One way might be to have Panton discussions on this - with a
>> range of points of view. Another would be to see OKF white-papers on the
>> realistic and unrealistic expectations of exposing data. If these were
>> compelling then they could inform decision-making by funders, publishers,
>> etc. I liked Chris Rusbridge's comment on the FOI-isation of data. We have
>> to have some perspective. There has to be a timegap between conception and
>> exposure, just as there has to be for most other information under FOI. And
>> I think we'd all agree FOI is a very blunt instrument.
>>
>> In summary "Data Is Difficult". The OKF is well positioned to outline
>> those difficulties cartefully and neutrally and provide guidelines for
>> addressing them
>>
>> P.
>>
>>
>> On Mon, Jul 19, 2010 at 9:08 PM, Lance McKee <lmckee at opengeospatial.org>
>> wrote:
>> Re:
>>
>>
>> so maybe it all comes back to better ways of annotating data.
>> I think it all comes down to science funders requiring metadata, and
>> requiring metadata that conforms to standards. Each discipline's data
>> coordination body, coordinating with other data coordinating bodies, needs
>> to look at ISO metadata standards, open source and proprietary metadata
>> tools, standards for encoding data and standards for interfaces on software
>> that produces or ingests data.
>>
>> For geospatial data, for example, there are the ISO 19115 and ISO 19119
>> metadata standards and OGC standards, including the OGC ("OpenGIS")
>> Observations and Measurements Encoding Standard (O&M) (
>> http://www.opengeospatial.org/standards/om) .
>>
>> In a Web services world, we will discover that the distinction betwee
>> "data files" and "metadata files" is an artifact of primitive 20th century
>> computer technology. There should be no distinction between these, other
>> than distinctions for the parts of a record, such as we have for books:
>> front cover and title, copyright page, title page, table of contents,
>> dedication page and acknowledgments, preface, introduction, body, footnotes,
>> index, glossary, back cover.  Is such a structure too complex for today's
>> digital science documents?
>>
>> Speak up and be bold, data curators, and coordinate like our planetary
>> lease depended on it!
>>
>> Lance
>>
>> Lance McKee
>> Senior Staff Writer
>> Open Geospatial Consortium (OGC)
>> 508-752-0108
>> lmckee at opengeospatial.org
>>
>> The OGC: International Location Standards
>> http://www.opengeospatial.org
>>
>>
>>
>> On Jul 19, 2010, at 2:07 PM, Jessy Cowan-Sharp wrote:
>>
>> interesting development. one point this article raises is the question of
>> how requirements for release might impact longitudinal studies, and in
>> general makes me wonder, when does something become FoI/FOIA-able? when it's
>> pencil marks in a notebook? when the file has been saved? when a
>> statistically significant number of observations have been made?
>>
>> or put another way, thinking more about sharing that hostile release, when
>> is a data set complete enough to share? when does a set of observations
>> become a data set? are there ways to dictate "in progress" or even "stream
>> data"?
>>
>> one of the biggest arguments against data sharing in science is that those
>> who haven't been intimately involved with the project "wouldn't get it".
>> this seems like a misnomer to me, since lack of availability/exposure to raw
>> data only exacerbates our lack of literacy with it. but especially with
>> charged issues like climate change it's easy to see how sharing can
>> backfire.
>>
>> so maybe it all comes back to better ways of annotating data.
>>
>> anyway, bit of a rhetorical rant i guess, but worth thinking about.
>>
>> jessy
>>
>>
>> On Mon, Jul 19, 2010 at 1:11 PM, Jonathan Gray <jonathan.gray at okfn.org>
>> wrote:
>> Interesting...
>>
>>
>> http://www.timeshighereducation.co.uk/story.asp?sectioncode=26&storycode=412475&c=2
>>
>> --
>> Jonathan Gray
>>
>> Community Coordinator
>> The Open Knowledge Foundation
>> http://blog.okfn.org
>>
>> http://twitter.com/jwyg
>> http://identi.ca/jwyg
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>>
>>
>> --
>> Jessy Cowan-Sharp
>> http://jessykate.com
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>>
>>
>> Lance McKee
>> Senior Staff Writer
>> Open Geospatial Consortium (OGC)
>> 508-752-0108
>> lmckee at opengeospatial.org
>>
>> The OGC: International Location Standards
>> http://www.opengeospatial.org
>>
>>
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>>
>>
>> --
>> Peter Murray-Rust
>> Reader in Molecular Informatics
>> Unilever Centre, Dep. Of Chemistry
>> University of Cambridge
>> CB2 1EW, UK
>> +44-1223-763069
>>
>
>
>
>
>
>


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20100720/a5e1e2f8/attachment-0001.html>


More information about the open-science mailing list