[open-science] Times Higher Education article on opening up research data

Lance McKee lmckee at opengeospatial.org
Tue Jul 20 12:47:19 UTC 2010


Peter,

Thank you for your reply. It's certainly true in the geospatial world  
that "data is difficult".

I would be very interested to know what you think of the OGC  
Observations and Measurements Encoding Standard (http://www.opengeospatial.org/standards/om 
  ). It defines an abstract model and an XML schema [www.w3.org/XML/Schema 
] encoding for observations and it provides support for common  
sampling strategies. O&M also provides a general framework for systems  
that deal in technical measurements in science and engineering. This  
is one of the OGC Sensor Web Enablement (SWE) [http://www.opengeospatial.org/ogc/markets-technologies/swe 
] suite of standards, which have applicability outside the geospatial  
domain.

I am a writer and promoter of ideas, not a scientist or technical  
expert, but I understand the issues well enough to suggest with  
conviction that the OGC provides both models of deliberative process  
and models of technical approaches that the Open Data community could  
usefully adapt as planks in their platform. (OGC standards and  
participation have proven useful to OneGeology http://www.onegeology.org/ 
  and efforts in ocean observation, meteorology & climate, and  
hydrology.) Demonstrating solutions to the technical "data publish/ 
discover/assess/access/use" problems -- "See? It works!" -- puts  
pressure on scientists and the institutions of science to  
collaboratively develop new policies and practices.

As you note, the term "Open Data" is going mainstream, and there is  
considerable confusion about what it means. The OGC began as "Open GIS  
Consortium" and then renamed itself "Open Geospatial Consortium" and  
we are constantly educating audiences about the differences between  
"open standards" and "open source", encodings and data models,  
consensus-derived open interfaces and proprietary open interfaces,  
etc. Considerable thought and effort goes into "positioning" (as  
marketers say) one's organization and the organization's mission and  
products. It's all part of shaping the human world.

Lance

Lance McKee
Senior Staff Writer
Open Geospatial Consortium (OGC)
508-752-0108
lmckee at opengeospatial.org

The OGC: International Location Standards
http://www.opengeospatial.org




On Jul 20, 2010, at 1:37 AM, Peter Murray-Rust wrote:

> I think this is a real opportunity for the OKF - although it will be  
> a lot of hard work.
>
> "Open Data" is now hitting us from every side. Five years ago the  
> term was unknown. Many of the people who are arguing for Open Data  
> (e.g. the THES article, Climategate, etc.) think Open Data is  
> trivial matter of pressing a button and exposing your nicely  
> ordered, recomputable data. Refusal to do so means you are selfish  
> and uncooperative.
>
> It's not like that. Data are messy, sprawling, incompletely  
> understood. They are often created or gathered by people who don't  
> understand data. In our own department many people have "lost data".  
> This isn't because they are deliberately irresponsible (they aren't)  
> but because the infrastructure (technical and cultural) isn't there.
>
> I have been trying to aspire to Open Notebook Science in  
> computational chemistry - where all the data are exposed at the time  
> of calculation. It's very difficult technically. The first time I  
> tried it I failed. I'm trying to work towards Reproducible  
> Computational Chemistry. I am some way away (though making progress).
>
> Data processing is almost inevitable. In many projects it goes  
> somethinig like:
> RawData => CleanedData => FilteredData
> The raw data usually comes from an instrument. It's almost never  
> absolute. It needs adjusting for calibration, for noise and many  
> other common artefacts. This is not "munging the data to fit the  
> theory", it is trying to create an abstraction which is independent  
> of the particular experimental setup.
>
> Then it needs filtering. A typical example is recognising  
> significant effects and separating these from noise. Of course noise  
> is *sometimes* a new scientific effect - e.g. pulsars - but most of  
> the time it is unexplained noise. We trust the domain expert to  
> clean and filter the data.
>
> This takes time. There's little (IMO) to be gained by exposing this,  
> any more than the calibration of microscopes and balances. I'm  
> personally happy for it to be expopsed, but not to find the time and  
> resources to do it.
>
> So I think the OKF is in a unique position to bring clarity and  
> neutrality into this. One way might be to have Panton discussions on  
> this - with a range of points of view. Another would be to see OKF  
> white-papers on the realistic and unrealistic expectations of  
> exposing data. If these were compelling then they could inform  
> decision-making by funders, publishers, etc. I liked Chris  
> Rusbridge's comment on the FOI-isation of data. We have to have some  
> perspective. There has to be a timegap between conception and  
> exposure, just as there has to be for most other information under  
> FOI. And I think we'd all agree FOI is a very blunt instrument.
>
> In summary "Data Is Difficult". The OKF is well positioned to  
> outline those difficulties cartefully and neutrally and provide  
> guidelines for addressing them
>
> P.
>
>
> On Mon, Jul 19, 2010 at 9:08 PM, Lance McKee <lmckee at opengeospatial.org 
> > wrote:
> Re:
>
>
> so maybe it all comes back to better ways of annotating data.
> I think it all comes down to science funders requiring metadata, and  
> requiring metadata that conforms to standards. Each discipline's  
> data coordination body, coordinating with other data coordinating  
> bodies, needs to look at ISO metadata standards, open source and  
> proprietary metadata tools, standards for encoding data and  
> standards for interfaces on software that produces or ingests data.
>
> For geospatial data, for example, there are the ISO 19115 and ISO  
> 19119 metadata standards and OGC standards, including the OGC  
> ("OpenGIS") Observations and Measurements Encoding Standard (O&M) (http://www.opengeospatial.org/standards/om 
> ) .
>
> In a Web services world, we will discover that the distinction  
> betwee "data files" and "metadata files" is an artifact of primitive  
> 20th century computer technology. There should be no distinction  
> between these, other than distinctions for the parts of a record,  
> such as we have for books: front cover and title, copyright page,  
> title page, table of contents, dedication page and acknowledgments,  
> preface, introduction, body, footnotes, index, glossary, back  
> cover.  Is such a structure too complex for today's digital science  
> documents?
>
> Speak up and be bold, data curators, and coordinate like our  
> planetary lease depended on it!
>
> Lance
>
> Lance McKee
> Senior Staff Writer
> Open Geospatial Consortium (OGC)
> 508-752-0108
> lmckee at opengeospatial.org
>
> The OGC: International Location Standards
> http://www.opengeospatial.org
>
>
>
> On Jul 19, 2010, at 2:07 PM, Jessy Cowan-Sharp wrote:
>
> interesting development. one point this article raises is the  
> question of how requirements for release might impact longitudinal  
> studies, and in general makes me wonder, when does something become  
> FoI/FOIA-able? when it's pencil marks in a notebook? when the file  
> has been saved? when a statistically significant number of  
> observations have been made?
>
> or put another way, thinking more about sharing that hostile  
> release, when is a data set complete enough to share? when does a  
> set of observations become a data set? are there ways to dictate "in  
> progress" or even "stream data"?
>
> one of the biggest arguments against data sharing in science is that  
> those who haven't been intimately involved with the project  
> "wouldn't get it". this seems like a misnomer to me, since lack of  
> availability/exposure to raw data only exacerbates our lack of  
> literacy with it. but especially with charged issues like climate  
> change it's easy to see how sharing can backfire.
>
> so maybe it all comes back to better ways of annotating data.
>
> anyway, bit of a rhetorical rant i guess, but worth thinking about.
>
> jessy
>
>
> On Mon, Jul 19, 2010 at 1:11 PM, Jonathan Gray  
> <jonathan.gray at okfn.org> wrote:
> Interesting...
>
> http://www.timeshighereducation.co.uk/story.asp?sectioncode=26&storycode=412475&c=2
>
> --
> Jonathan Gray
>
> Community Coordinator
> The Open Knowledge Foundation
> http://blog.okfn.org
>
> http://twitter.com/jwyg
> http://identi.ca/jwyg
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>
>
>
> -- 
> Jessy Cowan-Sharp
> http://jessykate.com
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>
>
>
> Lance McKee
> Senior Staff Writer
> Open Geospatial Consortium (OGC)
> 508-752-0108
> lmckee at opengeospatial.org
>
> The OGC: International Location Standards
> http://www.opengeospatial.org
>
>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>
>
>
> -- 
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069









More information about the open-science mailing list