The purpose of this post is to illustrate why not all data can be open, even in a research program dedicated to open and with a strong open philosophy.

Sustaining the Knowledge Commons sustainingknowledgecommons.org<http://sustainingknowledgecommons.org> (SKC) is a research program dedicated to facilitating the economics of scholarly communication from subscriptions / purchase to open access. As the Principal Investigator I have a strong commitment to open. The team publishes results-in-progress via a blog. We have a dataverse for open sharing of data. However, not all data can be open. Here is why.

SKC includes 2 lines of research. Open Access Article Processing Charges is a longitudinal study on the OA APCs of open access journals publishers that use this method. Some of our data is open, but not all. Resource requirements is a qualitative to quantitative study of what small scholar-led publishers need to survive and thrive in an OA environment. Some of our data is open, some closed, and some in between.

Open Data / OA APC dataverse

We aim to publish the full dataset as open data* in the OA APC dataverse:

Working data / not open but not necessarily closed

The 2016 OA APC dataset will be posted soon. It is not currently open data because the dataset needs work, that is, clean-up, quality assurance, and documentation. Behind the datasets lies a large quantity of working data of various types. Some of this data may eventually be made open. It is not open right now because a) without documentation and clean-up it is very difficult to understand the data; b) I am not confident that our current research team's data-sharing tool, google docs, is sufficiently secure, i.e. I'd prefer not to "open" the data to hacking; c) without academic analysis the legal status of information gained from publisher's website is unclear, i.e. if there is no processing I do not have a strong fair dealing claim and d) aiming to make this data open would be a workload issue of unknown value. I might be able to share some of this data with other researchers on request, but this kind of data is not likely to ever be fully open.

Data that is and must be confidential and closed

Another line of research involves qualitative research on the resource requirements of small scholar-led publishers. This is data from interviews and focus groups with editors of journals, gained on conditions of confidentiality. Interviewees have the option of making their own data open. No one has chosen this option. This data can only be made available after careful attention to anonymization, i.e. even identifying the journal country and discipline and the position of the interviewee identifies both the journal and the individual. If researchers like me wish to have people talk about their journals in terms of the work involved, the support for open access or lack thereof from editorial board members, their confidential financial data, etc., we need to be able to make strong commitments to maintaining confidentiality. In order to make this data available to anyone to view, I have to have them sign a commitment to confidentiality and have them formally included on my ethics certificate.

* this is open data with no open license. I realize that some on this list would dispute that this possible. There are reasons for not using a license, as explained in this article: http://www.mdpi.com/2306-5729/1/1/4

Suggestion: working with researchers to understand the nature of research collected, the barriers, challenges and opportunities involved with opening up data might be more fruitful at this point in time than attempting to implement and assess universal open data policy.


