[open-science] data repository primer?

Matt Jones jones at nceas.ucsb.edu
Wed Oct 17 04:04:09 UTC 2012


Hi Tom,

This is a great case study, as it illustrates some of the challenges to be
found when one gets down to the details in a specific discipline.  While
FigShare and Dryad and OKF are awesome for a lot of things, I don't think
they are the best choice for this type of data.  There is a
well-established set of data repositories for atmospheric data in the US
and other countries that would be set up to handle this (like the Oak Ridge
National Lab DAAC that houses similar data, or the National Snow and Ice
Data Center, or the National Oceanographic Data Center), and they already
are dealing with atmospheric modeling data like you describe.

Just today I was talking to folks from the MsTMIP project that is dealing
with comparison of model outputs for global atmospheric carbon models, and
the consensus there is that these model output data sets are each in the
one to multi-terabyte range (http://nacp.ornl.gov/mast-dc/MsTMIP.shtml).
 If your data are similar, then they likely are not appropriate for
web-based data upload systems as exemplified by FigShare, or even network
transfer at all.  They most often are moved around by FedExing hard drives,
or on dedicated high-speed networks over protocols like GridFTP that
support parallelism in network transfers (but even then it is painful).

There are efforts like GEOSS and the DataONE (http://dataone.org) project
that I'm involved in that are trying to enable interoperability among these
many extant repositories so that the data can be discovered regardless of
where they are housed.  Dryad and the ORNL DAAC and the KNB and many other
repositories are working together to see this happen.  I hope that you can
chose an existing repository that is set up for your scale of data and be
able to have it fit into a larger federation.  If do decide to try to
upload atmospheric model data to FigShare, you should certainly contact
them beforehand so you don't give their sysadmins a heart attack, and to be
sure that you meet their restrictions (e.g., for Dryad, data need to be
associated with a specific publication).  I know a number of people that
are working on this specific issue concerning archiving and discovery
of atmospheric data, and I'd be happy to put you in touch with them if you
were interested.

Regards,

Matt

On Tue, Oct 16, 2012 at 7:00 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:

> Greetings and thanks for this
>
> On Wed, Oct 17, 2012 at 3:26 AM, Tom Roche <Tom_Roche at pobox.com> wrote:
>
>>
>> Please point me toward one or more short introductions, for a
>> computational and scientific audience, to current options for data sharing
>> and archiving. Why I ask:
>>
>> I attended a conference today for users and developers of an atmospheric
>> model. Mostly it was presentations of research results, but we also had a
>> long general meeting about the model, its development, and (mostly) the
>> need for related tools and infrastructures. One topic was the need for
>> better data sharing and management: we currently tend to physically ship a
>> lot of physical hard drives after searching our social networks for folks
>> with needed datasets. One response is to start a torrent network, but we
>> also need ways/places to archive (preferably searchably). I gave a quick
>> OTTOMH talk about some repository options which I'm aware (pangaea.de,
>> figshare.com, thedatahub.org) and gave props to OKF.
>>
>> I'd like to follow that up with pointers to more information (seed the
>> discussion, to continue the torrent metaphor :-) and would appreciate your
>> advice regarding appropriate sources of information on the topic.
>> (Evangelism is OK too: while definitely international, this is a mostly-US
>> group which is generally unexposed to open-science norms.)
>>
>>
> What follows is my own opinion ...
> We are at a very early stage in this and there are no standard approaches.
> It differs greatly between disciplines. A few disciplines require data to
> be reposited in full (e.g. crystallography). I'd add http://datadryad.org/to your list.  Figshare was started because there was no communal solution.
>
> Funders are starting to require data management but they don't say how.
>
> The key thing is that your discipline owns the problem. A good solution
> might be a learned society, but this doesnt always work out - IMO the Am
> Chem Soc is a problem rather than a solution. A number of journals require
> deposition whereas others actively refuse to take data.
>
> P.
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20121016/8e02f0fe/attachment-0001.html>


More information about the open-science mailing list