[open-science] Open research data raises real issues not found in Open access

Sun Aug 7 18:24:02 UTC 2016

** with apologies for cross-posting **

282 investigators in 33 countries have endorsed a perspective on "fairness in data sharing", expressing concern about policies requiring clinical trial data publication, particularly with short deadlines:
http://www.nejm.org/doi/full/10.1056/NEJMp1605654

The purpose of this email is to invite my colleagues in the open science and open access communities to take the concerns raised seriously and not treat this as if it were the anti-OA / resistance to change we are accustomed to in open access.

The original focus of the open access movement was the scholarly peer-reviewed journal article that scholars have traditionally published without expectation of payment.

Research data has not traditionally been published. The kinds of issues that arise from opening up research data are very different from the issues in open access This group has raised some serious issues that merit serious research and discussion rather than dismissal. A few such issues in particular (from the NEJM):

-  open data policies could be a disincentive to conduct some types of research, or to publish early results of multi-stage research projects
- redirecting researchers' time and attention from conducting research and publishing results to preparing data for publication. Is this always worth it?
- concerns about quality of downstream research results

This is not a full list of issues that will arise with respect to open data. Some others (my list) will include:
- privacy of research subjects  (anonymization is not always possible, especially with small groups)
- data that belongs to third parties and is subject to their data policies; I expect this to grow as more organizations have interesting data for researchers to work with
- potential for errors in downstream research arising from lack of understanding of nuances of definition of variables and variations in sampling and data collection
- formatting and metadata for data sharing and interoperability

Please note that I speak as an early adopter of open data, one of the first to sign up for Harvard's dataverse while it was in pilot phase years ago, and an avid practitioner of open with respect to my own data. This is relatively easy for me as my data comes from the open web. In my experience, this is worthwhile but does take extra time even when there is no need for anonymization and the potential harm from downstream errors is much smaller than in an area like medicine.

My recommendations in this area:
- take the concerns of the NEJM group seriously, engage in discussion and undertake research on the issues. It is hard to get researchers to take the time to engage in these issues. This group is expressing an interest; let's chat with them
- encourage and support open data sharing (by providing infrastructure and developing incentives for sharing), but do not require it, or if we do have policies,  leave an opening for waivers for stated scholarly reasons as a minimum
- undertake or provide support for research on what needs to happen to achieve the potential benefits of data sharing (see list of issues above to start)
- develop open data policies on a case-by-case rather than broad brush approach, eg let's have open reusable government GIS data not locked down pictures today (like scholarly articles,  the basic decisions about publishing have already happened), but work with researchers in areas where has not traditionally been made public to fully understand the issues and move forward slowly, thoughtfully, and with the support of the research communities

best,

Heather Morrison
sustainingknowledgecommons.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20160807/66a379d5/attachment-0002.html>