[open-science] Why not publish data?

Mon Mar 16 07:39:54 UTC 2009

Hi Gavin,

There are lots of moves in the right direction already. All the UK Research Councils have policies on data publishing, so does Wellcome, the European Research Council and various others. 

At a meeting convened by the ESRC last Thursday in London we heard that a system for developing and assigning DOI-like identifiers to datasets is under development. And of course some journals in some fields require supporting data to be published before they will publish articles. This usually means that the author must provide, for example, a Genbank accession number for the dataset, but sometimes journals require the dataset to publish themselves on their own websites. this is far from optimal because they can usually only handle PDF, which is not a readily re-usable format. But it's a start and a good principle to establish.

The issue of career-related rewards for publishing data is still unresolved, though, and until we get to a point where researchers see an explicit link between sharing their data and career progression we will still be stuck in the situation where journal articles are the currency. This is artificial in some disciplines now, where practice and cultures are ready to consider datasets as the primary outputs but are hampered by the lack of reward mechanisms.

Our study on researchers' attitudes and practices with respect to data publication reports in detail on these issues and makes recommendations, including related to the issue of reward. It can be found here: http://eprints.ecs.soton.ac.uk/16742/

Alma Swan
Key Perspectives Ltd
Truro, UK

--- On Mon, 16/3/09, Gavin Baker <gavin at gavinbaker.com> wrote:
From: Gavin Baker <gavin at gavinbaker.com>
Subject: [open-science] Why not publish data?
To: "SPARC-OpenData" <SPARC-OpenData at arl.org>, open-science at lists.okfn.org
Date: Monday, 16 March, 2009, 6:00 AM

is the title of a post just published on my blog:
http://www.gavinbaker.com/2009/03/16/why-not-publish-data/

I'm eager for comments and critique. Feel free to comment on the blog
post or respond on this list. I'll copy the post before to facilitate
discussion:

> I try to avoid writing things that may make me sound stupid, but this
> post falls in that category.
> 
> Recently I was reading about efforts related to data sharing:
> technological infrastructure, curation, educating researchers, and
> the like. I was struck by the thought that most of the advocacy for
> data sharing boils down to an exhortation to stick it in a digital
> repository.
> 
> This seems a bit odd considering that much of what propels science is
> the pressure to publish (written) results (in journals, conferences,
> monographs, etc.). There is a hierarchy of venues in terms of
> prestige, which is in turn linked to research funding, promotion,
> public attention (media coverage, policy influence), etc.
> 
> Might the best way to get researchers to share data be to create a
> similar system for datasets? It might provide a compelling incentive.
> 
> Moreover, publishing might provide a compelling incentive to the
> related issue of data curation (making data understandable / usable
> to others, e.g. through formatting, annotation, etc.). Currently,
> much data doesn't see much use outside the lab where it was
> generated, so researchers have little incentive to spend time
> "prettying it up" for others (who may find the way it was
recorded to
> be inscrutable). Even if they are convinced to "share" their
data by
> posting it online, it may seem quite a low priority to spend time
> making it useful to others. If there was pressure to publish the
> dataset, though, then researchers would have that incentive to make
> the data as intuitively useful to others as practicable, so reviewers
> could quickly identify the novelty of the data.
> 
> This doesn't seem so outlandish to me. There are similar efforts to
> provide publication fora for materials which were not traditionally
> unpublished (we might say undersupplied), such as negative results
> and experimental techniques.
> 
> If you think of it in terms of a CV, the difference is between these
> lines:
> 
> * Created and shared large, valuable dataset which is highly regarded
> by peers 
> * Publication in J. Big Useful Datasets, impact factor X
> 
> It may be hard for a reviewer to quantify or validate the former; the
> latter demonstrates that the researcher's contribution has already
> been validated and provides built-in metrics to quantify the
> contribution.
> 
> There are other ways to skin the same cat. One option would be to
> build alternative systems for conferring recognition (e.g. awards,
> metrics for contributions to shared datasets, etc.). The other
> approach is to make data sharing a more enforceable part of other
> scientific endeavors, e.g. mandatory as a condition of research
> funding, mandatory as a condition of publication (of written results)
> in a journal, etc. I think multiple approaches will yield the best
> result. It seems to me that creating "journals" (or some other
name)
> for "publishing" datasets could be a useful way to spur
> participation.
> 
> Has this been done already? What are the drawbacks to this approach?

Best,
-- 
Gavin Baker
http://www.gavinbaker.com/
gavin at gavinbaker.com

Science is not everything, but science is very beautiful.
    J. Robert Oppenheimer

_______________________________________________
open-science mailing list
open-science at lists.okfn.org
http://lists.okfn.org/cgi-bin/mailman/listinfo/open-science
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20090316/ac7d3d97/attachment-0001.html>