[open-science] Openness and Licensing of (Open) Data

Neylon cameron.neylon at stfc.ac.uk
Fri Feb 6 16:01:10 UTC 2009


Yishay, I am afraid it isn¹t short but I¹ve tried to summarise:
http://blog.openwetware.org/scienceintheopen/2009/02/06/licenses-and-protoco
ls-for-open-science-the-debate-continues/

This is an important discussion that has been going on in disparate places,
but primarily at the moment is on the Open Science mailing list maintained
by the OKF. To try and keep things together and because Yishay Mor asked, I
thought I would try to summarize the current state of the debate.

The key aim here is to find a form of practice that will enhance data
availability, and protect it into the future.

There is general agreement that there is a need for some sort of declaration
associated with making data available. Clarity is important and the minimum
here would be a clear statement of intention.

Where there is disagreement is over what form this should take. Rufus
Pollock started by giving the reasons why this should be a formal license.
Rufus believes that a license provides certainty and clarity in a way that a
protocol, statement of principles, or expression of community standards can
not.  I, along with Bill Hooker and John Wilbanks [links are to posts on
mailing list], expressed a concern that actually the use of legal language,
and the notion of ³ownership² of this by lawyers rather than scientists
would have profound negative results. Andy Powell points out that this did
not seem to occur either in the Open Source movement or with much of the
open content community. But I believe he also hits the nail on the head with
the possible reason:

"I suppose the difference is that software space was already burdened with
heavily protective licences and that the introduction of open licences was
perceived as a step in the right direction, at least by those who like that
kind of thing."

Scientific data has a history of being assumed to be in public domain (see
the lack of any license at PDB or Genbank or most other databases) so there
isn¹t the same sense of pushing back from an existing strong IP or licensing
regime. However I think there is broad agreement that this protocol or
statement would look a lot like a license and would aim to have the legal
effect of at least providing clarity over the rights of users to copy,
re-purpose, and fork the objects in question.

Michael Nielsen and John Wilbanks have expressed a concern about the
potential for license proliferation and incompatibility. Michael cites the
example of Apache, Mozilla, and GPL2 licenses. This feeds into the issue of
the acceptability, or desirability of share-alike provisions which is an
area of significant division. Heather Morrison raises the issue of dealing
with commercial entities who may take data and use technical means to
effectively take it out of the public domain, citing the takeover of OAIster
by OCLC as a potential example.

This is a real area of contention I think because some of us (including me)
would see this in quite a positive light (data being used effectively in a
commercial setting is better than it not being used at all) as long as the
data is still both legally and technically in the public domain. Indeed this
is at the core of the power of a public domain declaration. The issue of
finding the resources that support the preservation of research objects in
the (accessible) public domain is a separate one but in my view if we don¹t
embrace the idea that money can and should be made off data placed in the
public domain then we are going to be in big trouble sooner or later because
the money will simply run out.

On the flip side of the argument is a strong tradition of arguing that viral
licensing and share alike provisions protect the rights and personal
investment of individuals and small players against larger commercial
entities. Many of the people who support open data belong to this tradition,
often for very good historical reasons. I personally don¹t disagree with the
argument on a logical level, but I think for scientific data we need to
provide clear paths for commercial exploitation because using science to do
useful things costs a lot of money. If you want people want to invest in
using the outputs of publicly funded research you need to provide them with
the certainty that they can legitimately use that data within their current
business practice. I think it is also clear that those of us who take this
line need to come up with a clear and convincing way of expressing this
argument because it is at the centre of the objection to ³protection² via
licenses and share alike provisions.

Finally Yishay brings us back to the main point. Something to keep focussed
on:

"I may be off the mark, but I would argue that there¹s a general principle
to consider here. I hold that any data collected by public money should be
made freely available to the public, for any use that contributes to the
public good. Strikes me as a no-brainer, but of course - we have a long way
to go. If we accept this principle, the licensing follows."

Obviously I don¹t agree with the last sentence - I would say that dedication
to the public domain follows - but the principle I think is something we can
agree that we are aiming for.


On 6/2/09 12:00, "Yishay Mor" <yishaym at gmail.com> wrote:

> I may be off the mark, but I would argue that there's a general principle to
> consider here. I hold that any data collected by public money should be made
> freely available to the public, for any use that contributes to the public
> good. Strikes me as a no-brainer, but of course - we have a long way to go. If
> we accept this principle, the licensing follows.


-- 
Scanned by iCritical.




More information about the open-science mailing list