[open-science] Why not publish data?

Peter Murray-Rust pm286 at cam.ac.uk
Mon Mar 16 07:54:16 UTC 2009

Thanks for positing to the list - I hope it generates discussion.

On Mon, Mar 16, 2009 at 6:00 AM, Gavin Baker <gavin at gavinbaker.com> wrote:

> is the title of a post just published on my blog:
> http://www.gavinbaker.com/2009/03/16/why-not-publish-data/
> I'm eager for comments and critique. Feel free to comment on the blog
> post or respond on this list. I'll copy the post before to facilitate
> discussion:
> > I try to avoid writing things that may make me sound stupid, but this
> > post falls in that category.
> >
> > Recently I was reading about efforts related to data sharing:
> > technological infrastructure, curation, educating researchers, and
> > the like. I was struck by the thought that most of the advocacy for
> > data sharing boils down to an exhortation to stick it in a digital
> > repository.

This would be simplisitic. There are many digital repositories which serve
this purpose but lots of thought and management go into them. They include
much of bioscience (EBI, NCBI, PDB, etc.) which you could call digital
repositories, but they are domain repositories managed by domain-oriented
scientists. It is often mandatory to deposit at point of publication.

What is simplistic is to think that simply putting data into institutional
repositories will be of great use. But I don't hear a great deal of advocacy
for that

> >
> ...

> >
> > Might the best way to get researchers to share data be to create a
> > similar system for datasets? It might provide a compelling incentive.
> >
> > If you think of it in terms of a CV, the difference is between these
> > lines:


> >
> > * Created and shared large, valuable dataset which is highly regarded
> > by peers
> > * Publication in J. Big Useful Datasets, impact factor X

Several people are advocating something like this. It depends on the culture
of the domain. I have encountered many peers that do not regard tools to
create and datasets as "proper science". But this is very domain dependent.

> >
> > It may be hard for a reviewer to quantify or validate the former; the
> > latter demonstrates that the researcher's contribution has already
> > been validated and provides built-in metrics to quantify the
> > contribution.

There are no built-in metrics, anymore than there are built-in metrics for
publications/papers.  Each data repository will be judged by the community.
It may give rise to metrics or it may not

> >
> > There are other ways to skin the same cat. One option would be to
> > build alternative systems for conferring recognition (e.g. awards,
> > metrics for contributions to shared datasets, etc.). The other
> > approach is to make data sharing a more enforceable part of other
> > scientific endeavors, e.g. mandatory as a condition of research
> > funding, mandatory as a condition of publication (of written results)
> > in a journal, etc. I think multiple approaches will yield the best
> > result. It seems to me that creating "journals" (or some other name)
> > for "publishing" datasets could be a useful way to spur
> > participation.

Some funders and some domains already do this. Others - like chemistry and
materials see data as a valuable competitive advantage or real-world IP.


Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20090316/c1f0b550/attachment-0001.html>

More information about the open-science mailing list