[open-science] Data Watch

Sat Jun 8 01:26:26 UTC 2013

Spinning the idea of an anonymous data sharing on request service a
little further: what is missing to get this automated?

I guess a very practical issue would be that journal articles (with
the exception of some types of data papers) are not marked up for the
type of data they are reporting on, and an automated generic request
to authors "please send me the data underlying your paper X" may well
be less efficient in gathering responses than something of the sort
"please send me the T2-weighted images of the control group in
experiment 2, along with the respective blood screenings".

But then again, exposing the (probably large) amount of non-responses
may be a nice basis for triggering discussions (which would have to be
field-specific, at least initially) on how to move towards more
effective ways of sharing data.

In cases where the authors actually do respond by sending some data,
the next challenge is then to determine what to do with it, since it
(a) is most likely not licensed for sharing further
(b) may have legit restrictions on sharing (e.g. patient privacy,
endangered species)
(c) may be in a proprietary or non-standard format
(d) may be incomplete
(e) may or may not be of a type for which suitable repositories exist
etc.

These latter considerations would apply whether the requests are
automated or not, and it could serve as the basis for a DataWatch
inititative that looks for problems in those data that are shared or
for which some available statements (e.g. in the respective papers)
are unclear or contradictory.

So while the data requesting and data watching ideas are heavily
intertwined, I think they require different skill and knowledge sets
and will probably appeal to different groups of people (some overlap
not withstanding). The data watching part will require field-specific
(or at least data type-specific) knowledge, for which OKF on its own
is not necessarily the best framework, but the data requesting part
sounds like something others are very unlikely to tackle.

Daniel

--
http://okfn.org
http://wikimedia.org

On Sat, Jun 8, 2013 at 1:18 AM, Jenny Molloy <jenny.molloy at okfn.org> wrote:
> Thanks Jonathan
>
> What are people's thoughts on the angle that would be most worth taking
> here? It may be that we're talking about two separate projects...
>
> My response to Jonathan:
>
> That was the other angle we were thinking of as well, it's a discussion
> worth having which of these is the most useful or if both are and whether to
> focus on just one.
>
> The contacting service is a good idea, it's been raised by editors at
> meetings I've been to that while they may require data sharing on request
> they hardly ever get contacted to say someone isn't doing what they're
> supposed to so it's very difficult to enforce.
>
> [OKF have] implemented data request services before although for open
> availability rather than availability per se and they weren't anonymous, but
> in principle some of the setup could be reused.
>
> Jenny
>
>
> On Fri, Jun 7, 2013 at 11:14 PM, Jonathan Eisen <jaeisen at ucdavis.edu> wrote:
>>
>> All
>>
>> Am forwarding this to the group - had sent it to Jenny Molloy in our 1-1
>> discussions
>>
>> "My original thought on this was to focus on data that was not even made
>> available yet should have been.  I had envisioned a site where anyone could
>> post a notice saying "I tried to get this data but couldn't" and even
>> offering a service to contact authors and editors anonymously so that people
>> could request access without fear of some sort of retribution.
>>
>> I had not thought of the issue of data quality that you mentioned in the
>> group email.  But I can see how that might be connected."
>>
>> Might be a little dated based on the current emails but figured I would
>> just forward this to the group ...
>>
>> Jonathan Eisen
>>
>>
>>
>>
>>
>> On Fri, Jun 7, 2013 at 2:39 PM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
>>>
>>> On 7 June 2013 19:32, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>>> > I'm obviously very supportive of DataWatch.  I think that some of our
>>> > activity may be per-journal rather than per-paper (e.g. when
>>> > Neuroscience
>>> > said they no longer required suppdata). DataWatch should then challenge
>>> > the
>>>
>>> An maybe also aggregate at the level of topic/field? Different
>>> communities have quite different habits and views on the topic of data
>>> sharing.
>>>
>>> > policy rather than the instance. And there may be areas where we can
>>> > give
>>> > POSITIVE acclaim where a journal adopts a data pub policy.
>>> >
>>> > Assuming that DataWatch takes off then it gives much more likelihood of
>>> > getting responses from editors and publishers and collating policies.
>>> >
>>> >
>>> >
>>> > On Fri, Jun 7, 2013 at 11:05 AM, Jenny Molloy <jenny.molloy at okfn.org>
>>> > wrote:
>>> >>
>>> >> Hi All
>>> >>
>>> >> I'm sure you're familiar with the excellent blog Retraction Watch run
>>> >> by
>>> >> Ivan Oransky and Adam Marcus http://retractionwatch.wordpress.com/
>>> >>
>>> >> In an blog post in 2012 [1], Jonathan Eisen suggested having a Data
>>> >> Watch
>>> >> site in the same vein. We discussed something similar in the Open
>>> >> Science
>>> >> Working Group at various times previously.
>>> >>
>>> >> We had considered using it to discuss both invalidated datasets (more
>>> >> like
>>> >> retraction watch) and data sharing cases where data is simply not
>>> >> available
>>> >> to back up published research, particularly where researchers refuse
>>> >> to
>>> >> share data despite agreements with funders or publishers to do so on
>>> >> request. The most well known examples recently being Reinhart-Rogoff
>>> >> [2] and
>>> >> (many) clinical trials [3].
>>> >>
>>> >> It would be interesting in the case of datasets found to be invalid to
>>> >> classify where the problem arose - mislabelling of columns, coding
>>> >> errors,
>>> >> data gaps?
>>> >>
>>> >> If you're interested in working on something like this (and the exact
>>> >> formulation of this is still very much up for discussion - all
>>> >> thoughts
>>> >> welcome!), then speak now and we can set up a group of founding
>>> >> editors :)
>>> >>
>>> >> Jenny
>>> >>
>>> >> [1]
>>> >>
>>> >> http://phylogenomics.blogspot.com/2012/01/draft-post-cleanup-3-open-knowledge.html
>>> >> [2]
>>> >>
>>> >> http://blog.okfn.org/2013/04/22/reinhart-rogoff-revisited-why-we-need-open-data-in-economics/
>>> >> [3] http://www.alltrials.net/
>>> >>
>>> >> _______________________________________________
>>> >> open-science mailing list
>>> >> open-science at lists.okfn.org
>>> >> http://lists.okfn.org/mailman/listinfo/open-science
>>> >> Unsubscribe: http://lists.okfn.org/mailman/options/open-science
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Peter Murray-Rust
>>> > Reader in Molecular Informatics
>>> > Unilever Centre, Dep. Of Chemistry
>>> > University of Cambridge
>>> > CB2 1EW, UK
>>> > +44-1223-763069
>>> >
>>> > _______________________________________________
>>> > open-science mailing list
>>> > open-science at lists.okfn.org
>>> > http://lists.okfn.org/mailman/listinfo/open-science
>>> > Unsubscribe: http://lists.okfn.org/mailman/options/open-science
>>> >
>>>
>>>
>>>
>>> --
>>> Laurent Gatto
>>> - http://proteome.sysbiol.cam.ac.uk/lgatto/
>>> Cambridge Centre for Proteomics
>>> - http://www.bio.cam.ac.uk/proteomics
>>
>>
>>
>>
>> --
>> Jonathan A. Eisen, Ph.D
>> Professor, University of California, Davis
>> Adjunct Scientist, DOE Joint Genome Institute
>> Blog: http://phylogenomics.blogspot.com/
>> Lab Page: http://phylogenomics.wordpress.com/
>> Twitter: http://twitter.com/phylogenomics/
>> Phone: 530 752 3498 (office)
>> Phone: 530 400 6066 (cell)
>
>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
> Unsubscribe: http://lists.okfn.org/mailman/options/open-science
>