[okfn-discuss] Re: [SPARC-OpenData] data sharing

Rufus Pollock rufus.pollock at okfn.org
Fri Dec 22 13:26:13 UTC 2006


John Wilbanks wrote:
> Hi all, chiming in here...just joined the list.
> 
>Rufus Pollock wrote:
>> "There will of course be some adaptation needed but the basic principles 
>> that must be addressed by an 'open' licence, be it for content or for 
>> scientific data, are essentially the same:
>> 
>> 1. Freedom to use, reuse and redistribute
>> 2. attribution (or not)
>> 3. sharealike (or not)
>> 
>> Of course I agree that there are some aspects of the CC licences that 
>> strike one as rather content oriented (e.g. talk of 'public 
>> performance') but these would no way invalidate the licence. Furthermore 
>> CC licences are already being used to licence various sets of geodata 
>> and we at the OKF have been using them for datasets we've produced. I 
>> also note that, for example, the Dutch creative commons team explicitly 
>> wrote in provisions to deal with database rights in the Dutch CC licences."
> 
> The lack of international consensus on data makes use of CC licenses for 
> data problematic.  The EU directive doesn't exist elsewhere, and is not 

Yes, the IPR status for collections data varies greatly by jurisdiction. 
Some have no protection at all, others have common-law copyright (e.g. 
Australia, and US pace Feist), the EU has copyright + a sui generis 
right etc.

> written into the majority of even the EU licenses.  This makes

Yes though that perhaps can be fixed via a suggestion to the various 
local drafting teams ...

> interoperation along the CCi model (where I can upload a file in the US, 
> and you can download it in Brazil) much harder to achieve, as the IPR 
> does not exist in the US.

Sure though:

(a) I think a lot of the jurisdictions have some kind of IPR that can be 
used. Furthermore these kinds of 'interoperability' issues already exist 
with the CC licences for content. A lot of people in England will just 
point to the standard US cc licences rather than the E&W licence even 
though it is not precisely customized to the national law. What one 
really wants is a clause in any national licence saying: when you 
licence under this licence you licence under the equivalent licence in 
all other jurisdictions. Such a provision would also work for CC 
licences attached to data.

(b) CC licences aren't just legal documents they are also a way of 
encoding the 'social contract'. Thus even if it turns out a licence is 
not perfectly enforceable attaching it to work is providing a useful 
signal to others of what the creator (or owner) wishes to permit. 
Particularly in the academic community such 'intentions' will carry 
strong weight since violation can be sanctioned in all kinds of 
non-legal ways.

> SC is examining the idea that data be simply tagged as public domain, 
> with terms of use requesting attribution.  The extension of copyright to

But many may not want their data to be public domain. Look at gracenote 
and freedb: the archetypal example of an appropriation of the commons. 
Many people want a sharealike provision in their data licences. Of 
course in jurisdictions where there is no underlying legal rights this 
is meaningless but in many jurisdictions it will not be.

> data, though it would let CC licenses be used, could also result in the 
> automatic assignment of copyrights to all data sets - which means that 
> if sharing licenses were *not* attached we would likely see a vast space 
> of orphan data with all rights reserved.  It seems to be a feature of 
> copyrighted content.

I am not sure I understand this. Either rights already exist in such 
datasets (though they may not be exercised) or they do not. If they do 
not then attaching CC licences won't suddenly create these rights. If 
they do, then attaching CC licences has just made the situation clearer. 
I do appreciate that in reality there is, of course, quite a lot of 
greyness over what one is allowed to do and this can be beneficial 
because it allows people informally to do stuff they might not be 
formally allowed to (the classic case of this is provided by the data in 
Walsh, Cohen and Cho who show that one reason patents in biotech have 
not had much impact on researchers is that the patents are routinely 
ignored out of ignorance, see [1]). That said, surely in the long run I 
think it is better to be explicit (more discussion on a similar theme 
can be found in [2]).

[1]:http://www.thefactz.org/ideas/archives/134
[2]:http://lists.okfn.org/pipermail/okfn-discuss/2006-October/000166.html

> Whereas a public domain designation with some terms of use would by 
> definition allow the use, reuse, and distribution of data, without the 
> need for a binding intellectual property license.  In some cases, using 
> intellectual property - which is a blunt instrument - can have dreadful 
> unintended consequences.

It can, though as I just said I'm not sure how attaching a licence would 
create such IPRs. Rather it might make people *aware* that such IPR 
exists -- in which case we are back to the previous point.

> Also, our research is unclear as to what "attribution" and "share alike" 
> mean in the context of data.  What if I run a query across 10,000 gene 
> expression data sets?  If I access only one record per data set? 
> Attribution and derivative works are terms built for copyrights, and the 
> legal implications might mean you have to attribute 10,000 people every 
> time you generate a data set.   The normative values of each field of 
> science work pretty well for this already...

These are hard questions but again if the IPR rights already exist these 
are questions that will have to be faced whether there is a licence or 
not. Furthermore, the courts have already been struggling (perhaps 
rather unsuccessfully) here in the EU and elsewhere to define these 
kinds of things. For example, the EU DB directive talks about the the 
right 'to prevent extraction and/or reutilization of the whole or of a 
substantial part, evaluated qualitatively and/or quantitatively, of the 
contents of that database.' For more on this see, e.g.:

http://www.ivir.nl/publications/hugenholtz/fordham2001.html

> Paul Uhlir made a very important point to me in person at the CODATA 
> meetings in Beijing.  A "commons" isn't just a place where "some rights" 
> are reserved.  It's a place where "some rights, or no rights" are 
> reserved.  Data may well fall into the latter category.

Absolutely though, as mentioned above, I would not underestimate the 
attractiveness of 'share-alike' provisions. In my own experience so far 
with licensing discussions there has been a strong support for these 
kind of provisions -- and we should also note the prevalence of the GPL 
in F/OSS community.

> However, as I said, we're examining the idea, and welcome the discussion.
> 
> Now, if you have a database, we have created a FAQ for owners, and 
> uniprot.org (the world's largest database of biological protein 
> information) uses the CC license in the following manner:
> 
> "We have chosen to apply the Creative Commons Attribution-NoDerivs 
> License to all copyrightable parts of our databases. This means that you 
> are free to copy, distribute, display and make commercial use of these 
> databases, provided you give us credit. However, if you intend to 
> distribute a modified version of one of our databases, you must ask us 
> for permission first." (http://www.pir.uniprot.org/terms.shtml)

To my mind this would mean that the database was *not* open/free in the 
sense of the open knowledge/data definition:

   http://okd.okfn.org/

What is their motivation for doing this? I assume it is an integrity 
concern, e.g. they don't want different version of the database floating 
around the net all slightly different. However why couldn't this be 
addressed by a standard provision of the PERL type: 'if you modify this 
database you must *not* distribute it under the same name and must 
clearly identify that is has been modified'

Regards,

Rufus Pollock
Open Knowledge Foundation
http://www.okfn.org/




More information about the okfn-discuss mailing list