[open-science] Why not publish data?

Gavin Baker gavin at gavinbaker.com
Mon Mar 16 06:00:26 UTC 2009

is the title of a post just published on my blog:

I'm eager for comments and critique. Feel free to comment on the blog
post or respond on this list. I'll copy the post before to facilitate

> I try to avoid writing things that may make me sound stupid, but this
> post falls in that category.
> Recently I was reading about efforts related to data sharing:
> technological infrastructure, curation, educating researchers, and
> the like. I was struck by the thought that most of the advocacy for
> data sharing boils down to an exhortation to stick it in a digital
> repository.
> This seems a bit odd considering that much of what propels science is
> the pressure to publish (written) results (in journals, conferences,
> monographs, etc.). There is a hierarchy of venues in terms of
> prestige, which is in turn linked to research funding, promotion,
> public attention (media coverage, policy influence), etc.
> Might the best way to get researchers to share data be to create a
> similar system for datasets? It might provide a compelling incentive.
> Moreover, publishing might provide a compelling incentive to the
> related issue of data curation (making data understandable / usable
> to others, e.g. through formatting, annotation, etc.). Currently,
> much data doesn't see much use outside the lab where it was
> generated, so researchers have little incentive to spend time
> "prettying it up" for others (who may find the way it was recorded to
> be inscrutable). Even if they are convinced to "share" their data by
> posting it online, it may seem quite a low priority to spend time
> making it useful to others. If there was pressure to publish the
> dataset, though, then researchers would have that incentive to make
> the data as intuitively useful to others as practicable, so reviewers
> could quickly identify the novelty of the data.
> This doesn't seem so outlandish to me. There are similar efforts to
> provide publication fora for materials which were not traditionally
> unpublished (we might say undersupplied), such as negative results
> and experimental techniques.
> If you think of it in terms of a CV, the difference is between these
> lines:
> * Created and shared large, valuable dataset which is highly regarded
> by peers 
> * Publication in J. Big Useful Datasets, impact factor X
> It may be hard for a reviewer to quantify or validate the former; the
> latter demonstrates that the researcher's contribution has already
> been validated and provides built-in metrics to quantify the
> contribution.
> There are other ways to skin the same cat. One option would be to
> build alternative systems for conferring recognition (e.g. awards,
> metrics for contributions to shared datasets, etc.). The other
> approach is to make data sharing a more enforceable part of other
> scientific endeavors, e.g. mandatory as a condition of research
> funding, mandatory as a condition of publication (of written results)
> in a journal, etc. I think multiple approaches will yield the best
> result. It seems to me that creating "journals" (or some other name)
> for "publishing" datasets could be a useful way to spur
> participation.
> Has this been done already? What are the drawbacks to this approach?

Gavin Baker
gavin at gavinbaker.com

Science is not everything, but science is very beautiful.
    J. Robert Oppenheimer

More information about the open-science mailing list