[open-science] open-science Digest, Vol 6, Issue 5

Mon Mar 16 11:13:25 UTC 2009

Nice thread.

Is there an index of data repositories?

Who is gearing up to curate and maintain BIG data? Google pulled out, Amazon
is starting to get it's feet wet in this area, anyone else?

- Ian

On Mon, Mar 16, 2009 at 10:32 AM, <open-science-request at lists.okfn.org>wrote:

> Send open-science mailing list submissions to
>        open-science at lists.okfn.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.okfn.org/cgi-bin/mailman/listinfo/open-science
> or, via email, send a message with subject or body 'help' to
>        open-science-request at lists.okfn.org
>
> You can reach the person managing the list at
>        open-science-owner at lists.okfn.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of open-science digest..."
>
>
> Today's Topics:
>
>   1. Why not publish data? (Gavin Baker)
>   2. Re: Why not publish data? (a.p.swan at talk21.com)
>   3. Re: Why not publish data? (Peter Murray-Rust)
>   4. Re: Why not publish data? (Hide, Branwen)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 16 Mar 2009 02:00:26 -0400
> From: Gavin Baker <gavin at gavinbaker.com>
> Subject: [open-science] Why not publish data?
> To: SPARC-OpenData <SPARC-OpenData at arl.org>,
>        open-science at lists.okfn.org
> Message-ID: <49BDEAFA.6090207 at gavinbaker.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> is the title of a post just published on my blog:
> http://www.gavinbaker.com/2009/03/16/why-not-publish-data/
>
> I'm eager for comments and critique. Feel free to comment on the blog
> post or respond on this list. I'll copy the post before to facilitate
> discussion:
>
> > I try to avoid writing things that may make me sound stupid, but this
> > post falls in that category.
> >
> > Recently I was reading about efforts related to data sharing:
> > technological infrastructure, curation, educating researchers, and
> > the like. I was struck by the thought that most of the advocacy for
> > data sharing boils down to an exhortation to stick it in a digital
> > repository.
> >
> > This seems a bit odd considering that much of what propels science is
> > the pressure to publish (written) results (in journals, conferences,
> > monographs, etc.). There is a hierarchy of venues in terms of
> > prestige, which is in turn linked to research funding, promotion,
> > public attention (media coverage, policy influence), etc.
> >
> > Might the best way to get researchers to share data be to create a
> > similar system for datasets? It might provide a compelling incentive.
> >
> > Moreover, publishing might provide a compelling incentive to the
> > related issue of data curation (making data understandable / usable
> > to others, e.g. through formatting, annotation, etc.). Currently,
> > much data doesn't see much use outside the lab where it was
> > generated, so researchers have little incentive to spend time
> > "prettying it up" for others (who may find the way it was recorded to
> > be inscrutable). Even if they are convinced to "share" their data by
> > posting it online, it may seem quite a low priority to spend time
> > making it useful to others. If there was pressure to publish the
> > dataset, though, then researchers would have that incentive to make
> > the data as intuitively useful to others as practicable, so reviewers
> > could quickly identify the novelty of the data.
> >
> > This doesn't seem so outlandish to me. There are similar efforts to
> > provide publication fora for materials which were not traditionally
> > unpublished (we might say undersupplied), such as negative results
> > and experimental techniques.
> >
> > If you think of it in terms of a CV, the difference is between these
> > lines:
> >
> > * Created and shared large, valuable dataset which is highly regarded
> > by peers
> > * Publication in J. Big Useful Datasets, impact factor X
> >
> > It may be hard for a reviewer to quantify or validate the former; the
> > latter demonstrates that the researcher's contribution has already
> > been validated and provides built-in metrics to quantify the
> > contribution.
> >
> > There are other ways to skin the same cat. One option would be to
> > build alternative systems for conferring recognition (e.g. awards,
> > metrics for contributions to shared datasets, etc.). The other
> > approach is to make data sharing a more enforceable part of other
> > scientific endeavors, e.g. mandatory as a condition of research
> > funding, mandatory as a condition of publication (of written results)
> > in a journal, etc. I think multiple approaches will yield the best
> > result. It seems to me that creating "journals" (or some other name)
> > for "publishing" datasets could be a useful way to spur
> > participation.
> >
> > Has this been done already? What are the drawbacks to this approach?
>
> Best,
> --
> Gavin Baker
> http://www.gavinbaker.com/
> gavin at gavinbaker.com
>
> Science is not everything, but science is very beautiful.
>    J. Robert Oppenheimer
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 16 Mar 2009 07:39:54 +0000 (GMT)
> From: a.p.swan at talk21.com
> Subject: Re: [open-science] Why not publish data?
> To: SPARC-OpenData <SPARC-OpenData at arl.org>,
>        open-science at lists.okfn.org,    Gavin Baker <gavin at gavinbaker.com>
> Message-ID: <929402.89003.qm at web86405.mail.ird.yahoo.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Gavin,
>
> There are lots of moves in the right direction already. All the UK Research
> Councils have policies on data publishing, so does Wellcome, the European
> Research Council and various others.
>
> At a meeting convened by the ESRC last Thursday in London we heard that a
> system for developing and assigning DOI-like identifiers to datasets is
> under development. And of course some journals in some fields require
> supporting data to be published before they will publish articles. This
> usually means that the author must provide, for example, a Genbank accession
> number for the dataset, but sometimes journals require the dataset to
> publish themselves on their own websites. this is far from optimal because
> they can usually only handle PDF, which is not a readily re-usable format.
> But it's a start and a good principle to establish.
>
> The issue of career-related rewards for publishing data is still
> unresolved, though, and until we get to a point where researchers see an
> explicit link between sharing their data and career progression we will
> still be stuck in the situation where journal articles are the currency.
> This is artificial in some disciplines now, where practice and cultures are
> ready to consider datasets as the primary outputs but are hampered by the
> lack of reward mechanisms.
>
> Our study on researchers' attitudes and practices with respect to data
> publication reports in detail on these issues and makes recommendations,
> including related to the issue of reward. It can be found here:
> http://eprints.ecs.soton.ac.uk/16742/
>
> Alma Swan
> Key Perspectives Ltd
> Truro, UK
>
> --- On Mon, 16/3/09, Gavin Baker <gavin at gavinbaker.com> wrote:
> From: Gavin Baker <gavin at gavinbaker.com>
> Subject: [open-science] Why not publish data?
> To: "SPARC-OpenData" <SPARC-OpenData at arl.org>, open-science at lists.okfn.org
> Date: Monday, 16 March, 2009, 6:00 AM
>
> is the title of a post just published on my blog:
> http://www.gavinbaker.com/2009/03/16/why-not-publish-data/
>
> I'm eager for comments and critique. Feel free to comment on the blog
> post or respond on this list. I'll copy the post before to facilitate
> discussion:
>
> > I try to avoid writing things that may make me sound stupid, but this
> > post falls in that category.
> >
> > Recently I was reading about efforts related to data sharing:
> > technological infrastructure, curation, educating researchers, and
> > the like. I was struck by the thought that most of the advocacy for
> > data sharing boils down to an exhortation to stick it in a digital
> > repository.
> >
> > This seems a bit odd considering that much of what propels science is
> > the pressure to publish (written) results (in journals, conferences,
> > monographs, etc.). There is a hierarchy of venues in terms of
> > prestige, which is in turn linked to research funding, promotion,
> > public attention (media coverage, policy influence), etc.
> >
> > Might the best way to get researchers to share data be to create a
> > similar system for datasets? It might provide a compelling incentive.
> >
> > Moreover, publishing might provide a compelling incentive to the
> > related issue of data curation (making data understandable / usable
> > to others, e.g. through formatting, annotation, etc.). Currently,
> > much data doesn't see much use outside the lab where it was
> > generated, so researchers have little incentive to spend time
> > "prettying it up" for others (who may find the way it was
> recorded to
> > be inscrutable). Even if they are convinced to "share" their
> data by
> > posting it online, it may seem quite a low priority to spend time
> > making it useful to others. If there was pressure to publish the
> > dataset, though, then researchers would have that incentive to make
> > the data as intuitively useful to others as practicable, so reviewers
> > could quickly identify the novelty of the data.
> >
> > This doesn't seem so outlandish to me. There are similar efforts to
> > provide publication fora for materials which were not traditionally
> > unpublished (we might say undersupplied), such as negative results
> > and experimental techniques.
> >
> > If you think of it in terms of a CV, the difference is between these
> > lines:
> >
> > * Created and shared large, valuable dataset which is highly regarded
> > by peers
> > * Publication in J. Big Useful Datasets, impact factor X
> >
> > It may be hard for a reviewer to quantify or validate the former; the
> > latter demonstrates that the researcher's contribution has already
> > been validated and provides built-in metrics to quantify the
> > contribution.
> >
> > There are other ways to skin the same cat. One option would be to
> > build alternative systems for conferring recognition (e.g. awards,
> > metrics for contributions to shared datasets, etc.). The other
> > approach is to make data sharing a more enforceable part of other
> > scientific endeavors, e.g. mandatory as a condition of research
> > funding, mandatory as a condition of publication (of written results)
> > in a journal, etc. I think multiple approaches will yield the best
> > result. It seems to me that creating "journals" (or some other
> name)
> > for "publishing" datasets could be a useful way to spur
> > participation.
> >
> > Has this been done already? What are the drawbacks to this approach?
>
> Best,
> --
> Gavin Baker
> http://www.gavinbaker.com/
> gavin at gavinbaker.com
>
> Science is not everything, but science is very beautiful.
>    J. Robert Oppenheimer
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/cgi-bin/mailman/listinfo/open-science
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://lists.okfn.org/pipermail/open-science/attachments/20090316/ac7d3d97/attachment-0001.htm
>
> ------------------------------
>
> Message: 3
> Date: Mon, 16 Mar 2009 07:54:16 +0000
> From: Peter Murray-Rust <pm286 at cam.ac.uk>
> Subject: Re: [open-science] Why not publish data?
> To: Gavin Baker <gavin at gavinbaker.com>
> Cc: SPARC-OpenData <SPARC-OpenData at arl.org>,
>        open-science at lists.okfn.org
> Message-ID:
>        <67fd68330903160054x2417aad6n2530f734b6a06d69 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Thanks for positing to the list - I hope it generates discussion.
>
> On Mon, Mar 16, 2009 at 6:00 AM, Gavin Baker <gavin at gavinbaker.com> wrote:
>
> > is the title of a post just published on my blog:
> > http://www.gavinbaker.com/2009/03/16/why-not-publish-data/
> >
> > I'm eager for comments and critique. Feel free to comment on the blog
> > post or respond on this list. I'll copy the post before to facilitate
> > discussion:
> >
> > > I try to avoid writing things that may make me sound stupid, but this
> > > post falls in that category.
> > >
> > > Recently I was reading about efforts related to data sharing:
> > > technological infrastructure, curation, educating researchers, and
> > > the like. I was struck by the thought that most of the advocacy for
> > > data sharing boils down to an exhortation to stick it in a digital
> > > repository.
>
>
> This would be simplisitic. There are many digital repositories which serve
> this purpose but lots of thought and management go into them. They include
> much of bioscience (EBI, NCBI, PDB, etc.) which you could call digital
> repositories, but they are domain repositories managed by domain-oriented
> scientists. It is often mandatory to deposit at point of publication.
>
> What is simplistic is to think that simply putting data into institutional
> repositories will be of great use. But I don't hear a great deal of
> advocacy
> for that
>
> >
> > >
> > ...
>
>
>
> > >
> > > Might the best way to get researchers to share data be to create a
> > > similar system for datasets? It might provide a compelling incentive.
> > >
> > > If you think of it in terms of a CV, the difference is between these
> > > lines:
>
> ...
>
> >
> > >
> > > * Created and shared large, valuable dataset which is highly regarded
> > > by peers
> > > * Publication in J. Big Useful Datasets, impact factor X
>
>
> Several people are advocating something like this. It depends on the
> culture
> of the domain. I have encountered many peers that do not regard tools to
> create and datasets as "proper science". But this is very domain dependent.
>
> >
> > >
> > > It may be hard for a reviewer to quantify or validate the former; the
> > > latter demonstrates that the researcher's contribution has already
> > > been validated and provides built-in metrics to quantify the
> > > contribution.
>
>
> There are no built-in metrics, anymore than there are built-in metrics for
> publications/papers.  Each data repository will be judged by the community.
> It may give rise to metrics or it may not
>
>
> > >
> > > There are other ways to skin the same cat. One option would be to
> > > build alternative systems for conferring recognition (e.g. awards,
> > > metrics for contributions to shared datasets, etc.). The other
> > > approach is to make data sharing a more enforceable part of other
> > > scientific endeavors, e.g. mandatory as a condition of research
> > > funding, mandatory as a condition of publication (of written results)
> > > in a journal, etc. I think multiple approaches will yield the best
> > > result. It seems to me that creating "journals" (or some other name)
> > > for "publishing" datasets could be a useful way to spur
> > > participation.
>
>
> Some funders and some domains already do this. Others - like chemistry and
> materials see data as a valuable competitive advantage or real-world IP.
>
> P.
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://lists.okfn.org/pipermail/open-science/attachments/20090316/c1f0b550/attachment-0001.htm
>
> ------------------------------
>
> Message: 4
> Date: Mon, 16 Mar 2009 10:32:32 -0000
> From: "Hide, Branwen" <Branwen.Hide at rin.ac.uk>
> Subject: Re: [open-science] Why not publish data?
> To: "Peter Murray-Rust" <pm286 at cam.ac.uk>, "Gavin Baker"
>        <gavin at gavinbaker.com>
> Cc: SPARC-OpenData <SPARC-OpenData at arl.org>,
>        open-science at lists.okfn.org
> Message-ID:
>        <CE0B7072F12A4445B9D57BCF657430840322D2A5 at w2k3-lonex2.ad.bl.uk>
> Content-Type: text/plain; charset="us-ascii"
>
> The Royal Meteorological Society and the National Centre for Atmospheric
> Science are in the process of developing a data journal specifically for
> Meteorological Data. A report of the project is due the end of April.
> More information on the programme can be found at
> http://www.jisc.ac.uk/whatwedo/programmes/reppres/sue/ojims.aspx or on
> the project site http://proj.badc.rl.ac.uk/ojims
>
>
>
> Branwen Hide
>
> Research Information Network (www.rin.ac.uk <http://www.rin.ac.uk/> )
>
>
>
>
>
> ________________________________
>
> From: open-science-bounces at lists.okfn.org
> [mailto:open-science-bounces at lists.okfn.org] On Behalf Of Peter
> Murray-Rust
> Sent: 16 March 2009 07:54
> To: Gavin Baker
> Cc: SPARC-OpenData; open-science at lists.okfn.org
> Subject: Re: [open-science] Why not publish data?
>
>
>
> Thanks for positing to the list - I hope it generates discussion.
>
> On Mon, Mar 16, 2009 at 6:00 AM, Gavin Baker <gavin at gavinbaker.com>
> wrote:
>
> is the title of a post just published on my blog:
> http://www.gavinbaker.com/2009/03/16/why-not-publish-data/
>
> I'm eager for comments and critique. Feel free to comment on the blog
> post or respond on this list. I'll copy the post before to facilitate
> discussion:
>
> > I try to avoid writing things that may make me sound stupid, but this
> > post falls in that category.
> >
> > Recently I was reading about efforts related to data sharing:
> > technological infrastructure, curation, educating researchers, and
> > the like. I was struck by the thought that most of the advocacy for
> > data sharing boils down to an exhortation to stick it in a digital
> > repository.
>
>
>
> This would be simplisitic. There are many digital repositories which
> serve this purpose but lots of thought and management go into them. They
> include much of bioscience (EBI, NCBI, PDB, etc.) which you could call
> digital repositories, but they are domain repositories managed by
> domain-oriented scientists. It is often mandatory to deposit at point of
> publication.
>
> What is simplistic is to think that simply putting data into
> institutional repositories will be of great use. But I don't hear a
> great deal of advocacy for that
>
>
>        >
>        ...
>
>
>
>        >
>        > Might the best way to get researchers to share data be to
> create a
>        > similar system for datasets? It might provide a compelling
> incentive.
>        >
>        > If you think of it in terms of a CV, the difference is between
> these
>        > lines:
>
> ...
>
>
>        >
>        > * Created and shared large, valuable dataset which is highly
> regarded
>        > by peers
>        > * Publication in J. Big Useful Datasets, impact factor X
>
>
> Several people are advocating something like this. It depends on the
> culture of the domain. I have encountered many peers that do not regard
> tools to create and datasets as "proper science". But this is very
> domain dependent.
>
>
>        >
>        > It may be hard for a reviewer to quantify or validate the
> former; the
>        > latter demonstrates that the researcher's contribution has
> already
>        > been validated and provides built-in metrics to quantify the
>        > contribution.
>
>
>
> There are no built-in metrics, anymore than there are built-in metrics
> for publications/papers.  Each data repository will be judged by the
> community. It may give rise to metrics or it may not
>
>
>        >
>        > There are other ways to skin the same cat. One option would be
> to
>        > build alternative systems for conferring recognition (e.g.
> awards,
>        > metrics for contributions to shared datasets, etc.). The other
>        > approach is to make data sharing a more enforceable part of
> other
>        > scientific endeavors, e.g. mandatory as a condition of
> research
>        > funding, mandatory as a condition of publication (of written
> results)
>        > in a journal, etc. I think multiple approaches will yield the
> best
>        > result. It seems to me that creating "journals" (or some other
> name)
>        > for "publishing" datasets could be a useful way to spur
>        > participation.
>
>
> Some funders and some domains already do this. Others - like chemistry
> and materials see data as a valuable competitive advantage or real-world
> IP.
>
>
> P.
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
> The contents of this email are sent in confidence for the use of the
> intended recipients only. If you are not one of the intended recipients,
> please do not take action on it or show it to anyone else, but return this
> email to the sender and delete your copy of it.
>
> Find the Research Information Network online at www.rin.ac.uk
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://lists.okfn.org/pipermail/open-science/attachments/20090316/0e8a3960/attachment.htm
>
> ------------------------------
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/cgi-bin/mailman/listinfo/open-science
>
>
> End of open-science Digest, Vol 6, Issue 5
> ******************************************
>

-- 

----
ian at mulvany.net | +447506679466
(new number, no voicemail, txt or email please)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20090316/b03f65c8/attachment.html>