[open-science] Let us denonce the pseudo-open Public Library of Science

Tue Feb 14 15:54:42 UTC 2017

My argument (so far) is that replicability is a poor argument for open data. Open data does not facilitate replication, nor is it necessary to replication.

To help explain further why this matters:

Opening up data opens up new avenues for research that could be very valuable to advance our knowledge. I think this is something we agree on. Here, my concern is that if we believe that repeating analysis of the same data using the same analytic tools equals replication, we set ourselves up for false confidence in findings that may be highly repeatable (using the same data) with unknown validity.

New research methods requires new research into the methodology itself.

best,

Heather Morrison

On 2017-02-14, at 10:34 AM, Thomas Kluyver <takowl at gmail.com<mailto:takowl at gmail.com>>
 wrote:

On 14 February 2017 at 14:58, Paola Di Maio <paola.dimaio at gmail.com<mailto:paola.dimaio at gmail.com>> wrote:
1. Does open data facilitate replicability? I argue that it does not. At most, open data permits repeat analysis of the same data. This is a good thing, but it is not replication. To replicate a study, one must repeat the study, sometimes with variations to eliminate limitations of prior studies, gather new data.

to replicate a study, one must repeate the study -
assuming that by 'study' you mean the application of a methodology

 but to replicate the result of a study, one needs the  exact data that the study has used. what about if I get different results from the same study (method)?  what would that imply?

To me, the key here is that a lot of modern science hinges on how you analyse the data. A classic experiment like the candle in a jar pulling up water has a clear result which doesn't require much analysis. But modern research often involves trying to determine whether a pattern or difference in some numbers represents a real phenomenon or just random chance. Things like confounding correlated factors, multiple tests and so forth can make a big difference. When more of the steps involve slicing and dicing numbers after the experiment itself, replication of the method to get from raw data to conclusions becomes more important.

The polling for the US election is a good example of this. Almost all pollsters predicted a Clinton win, with varying degrees of confidence. We know how that turned out. I don't believe they were fabricating the raw poll results, but their segmentation and 'likely voter' adjustments weren't quite right. I've no idea if they release the raw data from that, and there may be issues with personal identifiability, but it would be interesting to reproduce their headline results and do some sensitivity analysis to see what assumptions might have been incorrect.

We can also reproduce the analysis steps much more easily than steps that involve physical experiments, polling, etc. So sharing raw data is a useful part of replication, though clearly not the whole story.

Thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20170214/5a005773/attachment-0003.html>