[open-science] Data Watch

Daniel Mietchen daniel.mietchen at googlemail.com
Sat Jun 8 01:49:43 UTC 2013


Some more thoughts:
- If automation is not in reach, perhaps crowdsourcing may be an
option, for which I see two major ways:
(1) via the public (classical citizen science of the pybossa-supportable kind)
(2) via funders and the reports they request from their grantees
(Knowledge Exchange are looking to expand their activities in the
areas of open knowledge and open culture and might be open to
experiments along these lines - I have sent in an OKCon submission
with them after attending a meeting on the matter).
In both cases, much depends on a suitably designed request form (easy
to fill in, easy to tailor to the specifics of a paper or dataset).

- Getting useful metadata is often a big challenge if data sharing was
not built in from the start.

- If we think about data watching, the sharing code should be
considered as well.

Daniel

On Sat, Jun 8, 2013 at 3:26 AM, Daniel Mietchen
<daniel.mietchen at googlemail.com> wrote:
> Spinning the idea of an anonymous data sharing on request service a
> little further: what is missing to get this automated?
>
> I guess a very practical issue would be that journal articles (with
> the exception of some types of data papers) are not marked up for the
> type of data they are reporting on, and an automated generic request
> to authors "please send me the data underlying your paper X" may well
> be less efficient in gathering responses than something of the sort
> "please send me the T2-weighted images of the control group in
> experiment 2, along with the respective blood screenings".
>
> But then again, exposing the (probably large) amount of non-responses
> may be a nice basis for triggering discussions (which would have to be
> field-specific, at least initially) on how to move towards more
> effective ways of sharing data.
>
> In cases where the authors actually do respond by sending some data,
> the next challenge is then to determine what to do with it, since it
> (a) is most likely not licensed for sharing further
> (b) may have legit restrictions on sharing (e.g. patient privacy,
> endangered species)
> (c) may be in a proprietary or non-standard format
> (d) may be incomplete
> (e) may or may not be of a type for which suitable repositories exist
> etc.
>
> These latter considerations would apply whether the requests are
> automated or not, and it could serve as the basis for a DataWatch
> inititative that looks for problems in those data that are shared or
> for which some available statements (e.g. in the respective papers)
> are unclear or contradictory.
>
> So while the data requesting and data watching ideas are heavily
> intertwined, I think they require different skill and knowledge sets
> and will probably appeal to different groups of people (some overlap
> not withstanding). The data watching part will require field-specific
> (or at least data type-specific) knowledge, for which OKF on its own
> is not necessarily the best framework, but the data requesting part
> sounds like something others are very unlikely to tackle.
>
> Daniel
>
> --
> http://okfn.org
> http://wikimedia.org
>
>
> On Sat, Jun 8, 2013 at 1:18 AM, Jenny Molloy <jenny.molloy at okfn.org> wrote:
>> Thanks Jonathan
>>
>> What are people's thoughts on the angle that would be most worth taking
>> here? It may be that we're talking about two separate projects...
>>
>> My response to Jonathan:
>>
>> That was the other angle we were thinking of as well, it's a discussion
>> worth having which of these is the most useful or if both are and whether to
>> focus on just one.
>>
>> The contacting service is a good idea, it's been raised by editors at
>> meetings I've been to that while they may require data sharing on request
>> they hardly ever get contacted to say someone isn't doing what they're
>> supposed to so it's very difficult to enforce.
>>
>> [OKF have] implemented data request services before although for open
>> availability rather than availability per se and they weren't anonymous, but
>> in principle some of the setup could be reused.
>>
>> Jenny
>>
>>
>> On Fri, Jun 7, 2013 at 11:14 PM, Jonathan Eisen <jaeisen at ucdavis.edu> wrote:
>>>
>>> All
>>>
>>> Am forwarding this to the group - had sent it to Jenny Molloy in our 1-1
>>> discussions
>>>
>>> "My original thought on this was to focus on data that was not even made
>>> available yet should have been.  I had envisioned a site where anyone could
>>> post a notice saying "I tried to get this data but couldn't" and even
>>> offering a service to contact authors and editors anonymously so that people
>>> could request access without fear of some sort of retribution.
>>>
>>> I had not thought of the issue of data quality that you mentioned in the
>>> group email.  But I can see how that might be connected."
>>>
>>> Might be a little dated based on the current emails but figured I would
>>> just forward this to the group ...
>>>
>>> Jonathan Eisen
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jun 7, 2013 at 2:39 PM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
>>>>
>>>> On 7 June 2013 19:32, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>>>> > I'm obviously very supportive of DataWatch.  I think that some of our
>>>> > activity may be per-journal rather than per-paper (e.g. when
>>>> > Neuroscience
>>>> > said they no longer required suppdata). DataWatch should then challenge
>>>> > the
>>>>
>>>> An maybe also aggregate at the level of topic/field? Different
>>>> communities have quite different habits and views on the topic of data
>>>> sharing.
>>>>
>>>> > policy rather than the instance. And there may be areas where we can
>>>> > give
>>>> > POSITIVE acclaim where a journal adopts a data pub policy.
>>>> >
>>>> > Assuming that DataWatch takes off then it gives much more likelihood of
>>>> > getting responses from editors and publishers and collating policies.
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Jun 7, 2013 at 11:05 AM, Jenny Molloy <jenny.molloy at okfn.org>
>>>> > wrote:
>>>> >>
>>>> >> Hi All
>>>> >>
>>>> >> I'm sure you're familiar with the excellent blog Retraction Watch run
>>>> >> by
>>>> >> Ivan Oransky and Adam Marcus http://retractionwatch.wordpress.com/
>>>> >>
>>>> >> In an blog post in 2012 [1], Jonathan Eisen suggested having a Data
>>>> >> Watch
>>>> >> site in the same vein. We discussed something similar in the Open
>>>> >> Science
>>>> >> Working Group at various times previously.
>>>> >>
>>>> >> We had considered using it to discuss both invalidated datasets (more
>>>> >> like
>>>> >> retraction watch) and data sharing cases where data is simply not
>>>> >> available
>>>> >> to back up published research, particularly where researchers refuse
>>>> >> to
>>>> >> share data despite agreements with funders or publishers to do so on
>>>> >> request. The most well known examples recently being Reinhart-Rogoff
>>>> >> [2] and
>>>> >> (many) clinical trials [3].
>>>> >>
>>>> >> It would be interesting in the case of datasets found to be invalid to
>>>> >> classify where the problem arose - mislabelling of columns, coding
>>>> >> errors,
>>>> >> data gaps?
>>>> >>
>>>> >> If you're interested in working on something like this (and the exact
>>>> >> formulation of this is still very much up for discussion - all
>>>> >> thoughts
>>>> >> welcome!), then speak now and we can set up a group of founding
>>>> >> editors :)
>>>> >>
>>>> >> Jenny
>>>> >>
>>>> >> [1]
>>>> >>
>>>> >> http://phylogenomics.blogspot.com/2012/01/draft-post-cleanup-3-open-knowledge.html
>>>> >> [2]
>>>> >>
>>>> >> http://blog.okfn.org/2013/04/22/reinhart-rogoff-revisited-why-we-need-open-data-in-economics/
>>>> >> [3] http://www.alltrials.net/
>>>> >>
>>>> >> _______________________________________________
>>>> >> open-science mailing list
>>>> >> open-science at lists.okfn.org
>>>> >> http://lists.okfn.org/mailman/listinfo/open-science
>>>> >> Unsubscribe: http://lists.okfn.org/mailman/options/open-science
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Peter Murray-Rust
>>>> > Reader in Molecular Informatics
>>>> > Unilever Centre, Dep. Of Chemistry
>>>> > University of Cambridge
>>>> > CB2 1EW, UK
>>>> > +44-1223-763069
>>>> >
>>>> > _______________________________________________
>>>> > open-science mailing list
>>>> > open-science at lists.okfn.org
>>>> > http://lists.okfn.org/mailman/listinfo/open-science
>>>> > Unsubscribe: http://lists.okfn.org/mailman/options/open-science
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Laurent Gatto
>>>> - http://proteome.sysbiol.cam.ac.uk/lgatto/
>>>> Cambridge Centre for Proteomics
>>>> - http://www.bio.cam.ac.uk/proteomics
>>>
>>>
>>>
>>>
>>> --
>>> Jonathan A. Eisen, Ph.D
>>> Professor, University of California, Davis
>>> Adjunct Scientist, DOE Joint Genome Institute
>>> Blog: http://phylogenomics.blogspot.com/
>>> Lab Page: http://phylogenomics.wordpress.com/
>>> Twitter: http://twitter.com/phylogenomics/
>>> Phone: 530 752 3498 (office)
>>> Phone: 530 400 6066 (cell)
>>
>>
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>> Unsubscribe: http://lists.okfn.org/mailman/options/open-science
>>




More information about the open-science mailing list