[open-science] data repository primer?

Tom Roche Tom_Roche at pobox.com
Wed Oct 17 11:22:53 UTC 2012


Tom Roche Tue, 16 Oct 2012 23:08:30 -0400
>>> One [conference] topic was the need for better data sharing and
>>> management: we currently tend to physically ship a lot of physical
>>> hard drives after searching our social networks for folks with
>>> needed datasets. One response is to start a torrent network, but
>>> we also need ways/places to archive searchably[, hence] I'd like
>>> to find out more about interests and plans in this space.

Peter Murray-Rust Wed, 17 Oct 2012 04:00:36 +0100
>> I'd add http://datadryad.org/ to your list.

As (barely) mentioned: CMAQ's community is regulatory as well as
research. Without going deep-dive in detail, the model is widely used
by governments and NGOs to forecast air quality, and this use is often
required by law. Hence,

* academic publication is not relevant for much of our community

* much data of interest is currently "published" via things like
  court dockets, which are not easy to find or parse

Hence Dryad seems not to serve a large proportion of our population,
but it's definitely worth adding for those who it might.

Matt Jones Tue, 16 Oct 2012 20:04:09 -0800
> There is a well-established set of data repositories for atmospheric
> data in the US and other countries that would be set up to handle
> this (like the Oak Ridge National Lab DAAC that houses similar data,
> or the National Snow and Ice Data Center, or the National
> Oceanographic Data Center)

Thanks for those suggestions. I'm more aware of NOAA's NCDC. My
impression is, establishing oneself as a contributor at these federal
repositories is often difficult if one is not a federal entity. Is
that correct?

> model output data sets are each in the one to multi-terabyte range

Actually, most of the interest expressed at the meeting regarded
reference *input* datasets used to setup runs, which are *much*
smaller.

> There are efforts like GEOSS and the DataONE (http://dataone.org)
> project that I'm involved in that are trying to enable
> interoperability among these many extant repositories so that the
> data can be discovered regardless of where they are housed.

thanks again

> I know a number of people that are working on this specific issue
> concerning archiving and discovery of atmospheric data, and I'd be
> happy to put you in touch with them if you were interested.

Please do. CMAQers definitely seemed interested, so I'd like to
provide folks with options for followup. (And then get back to my
thesis :-)

TIA, Tom Roche <Tom_Roche at pobox.com>




More information about the open-science mailing list