[open-science] Openness and Licensing of (Open) Data

Rufus Pollock rufus.pollock at okfn.org
Tue Feb 10 17:18:56 UTC 2009


Dear John,

Thanks for your clarifications/comments. I've responded inline below
in the usual fashion. I wanted to pose a couple of specific questions
to focus the discussion but since this post (and thread) is getting a
little long I think I am going to send them separately.

Rufus

2009/2/9 John Wilbanks <wilbanks at creativecommons.org>:
>>
>> > 1. it is easy to imagine that in 14 we have very valuable data web,
>> > and that a db under the eu directive is in there and the owner wants
>> > value now. a lawsuit under ip has the power to bring down lots of
>> > other stuff. this is a very plausible future under a licensing regime
>> > but impossible under a normative regime.
>>
>> Not sure I follow here (perhaps typos).
>
> What I mean is that in 14 years, a lawsuit can be filed under a license that
> can bring down significant federations of data into which that data was
> integrated. That lawsuit cannot be filed under a normative system - if a
> community wants to punish one of its own members it can do so using its own
> systems, but the legal status of the data web is secure.

Just to be clear: you mean the legal status is secure if everyone has
correctly licensed/dedicated their stuff to the PD.

Also I really am not convinced that allowing attribution is going to
lead to 'significant federations of data' being 'brought down'.

Imagine I come along and say: 'hey you are not attributing me'

All I need to to do comply is to add that attribution and everything
is fine. (I'd also note that even in the norms case,  where were no
'formal' obligation, the data holder would probably want to comply
with this kind of thing).

>> > 2 the involuntary infringer is liable for damages under licenses, even
>> > if she was not the person who removed attribution or sharealike
>> > clauses two steps back in the data web. if there is monetary value to
>> > be had, this is going to happen. endorsing open licenses = endorsing
>> > enforcement, even is the intent is not to go to court. and if we don't
>> > plan to go to court, why even allow the chance for this?
>>
>> Because not planning does not mean it never happening (or the threat
>> of it happening not being important).
>
> You're assuming good actors use the open licenses. I'm assuming all kinds of
> actors use them. It's so hard to figure out where copyright stops and starts

I wasn't -- but I wasn't pointing out that enforcement can be
important to good guys.

> in a database that it's pretty easy to see scenarios where people assume the
> human genome is copyrighted, or map data is copyrighted, and sue for failed
> attribution. The strict liability nature of copyright is a real problem

And all you need to do is attribute ... (as you would with e.g. norms)

> here. We're not talking anything like software, where you *have* to have
> copyright licenses. We're choosing to put them in places where they don't
> natively belong. And what we are doing is creating a world in which this

But this just isn't true. There are rights in data all over the place
(and even in the US it is debatable -- Feist doesn't remove
everything!).

> kind of lawsuit is plausibly predictable. Why? Given that the data licenses
> are much less enforceable from a copyright perspective than the software
> licenses (again due to the underlying nature of the content) why is this
> worth the so easily seen risk?

I think I should reiterate an earlier distinction I suggested between:

1. What we allow as 'open' data
2. What we recommend

Saying sharealike and attribution should be allowed for open stuff is
not the same as saying it is recommended (I personally use MIT/BSD
type stuff for what I do but I know people who really like GPL ...)

Re your specific risk point: I see the risk as low (esp. for
attribution) and benefits as significant.

> Stallmann and Mako taught me this. If you can enable an evil use with your
> tools, you are obliged to contemplate it. It's an important aspect at all
> times. PD + norms mean that someone can take a copy of a database private,
> and they don't have to share their stuff under the law. But they also mean
> that anyone else in the world can keep a copy of that database online and
> public. No one's use can enclose anyone else's use for any purpose and bad

I think the concern is less about not sharing the db itself as
modifications and improvements to it.

> actors can't sue when they suddenly realize there's a valuable data web.
> Remember that every google query in a data web is a derivative data product,
> and attribution will be legally required with preliminary injunction power
> available to every searched database under a license scheme. That's the
> outcome of licensing choices.

Again I come back to the recommendation versus permit option (or: you
don't have to use attribution and share-alike if you don't want).

Suppose we permit share-share-alike and attribution. Users of data
will have options of using PD data, attribution data and share-alike
data. If attribution or share-alike is a real risk for users they can
and will choose PD data. (Likely leading to PD being the default and
SA and Attribution withering away :) )

>> I also really don't understand the logic here. Court only happens due
>> to enforcement when people don't obey (and intentionally -- it is
>> extremely unlikely you are going to get in trouble for inadvertantly
>> failing to comply ...). I.e. it is important precisely when 'norms'
>> aren't cutting it ...
>>
>> It seems to me that that the status of a the GPL as a license rather
>> than a norm has been important in getting compliance from some people
>> over time -- and more importantly in encouraging others to contribute
>> to GPL'd projects in the first place
>
> Stallman had no choice but to use copyrights. We have a choice. He had to

My understanding -- though I may be wrong -- was that he also wanted
more: to ensure that what he did wasn't 'closed' (and keeping hist own
'individual' patch set 'PD' would not have been enough: he needed that
those who took his work and built upon it also gave back).

> break down a culture of closure. We have a culture of openness. We have a
> chance to do things differently, to not built a world based on ideas of
> control and contract, to encode a world that in many ways already exists.

Woah there :) I feel things are getting a bit inverted:

We all agree that open (however done) will mean freedom to use, reuse
and redistribute however you want subject to (at most) 2 reservations:

1. Attribution
2. Share-alike

While I agree that allowing reservations involves more 'control' than
pure PD I think we are very long way from the control and contract of
the standard system.

>> > 3. fragmentation. data disciplines vary much more than software. their
>> > norms swing wildly, as do secondary regimes like privacy. creating the
>> > idea that one or two licenses suffice doesn't reflect this - it will
>> > force disciplines to choose between their norms and technolegal
>> > interoperability. and if the norms all get encoded in their own
>> > licenses we will indeed see fragmentation. this scenario also means
>> > that cross discipline mashup will be legally difficult.
>>
>> I am not sure I really buy this. Code gets used across a lot of different
>> areas
>
> Physicists have different norms than biologists, who have different norms
> than anthropologists. Trust me, not everyone in science is going to use a
> map license. Choosing the encoding of norms in contract in the sciences

Did I say they would :)

> essentially guarantees that we will have legal fragmentation, because the
> norms differ wildly across the disciplines.

But are we encoding the norms of the disciplines or what people can do
with some data. While, of course, these two items may be related they
are a long way from being the same.

[snip]

>> Plus we are not talking about the whole governance of a discipline but
>> data sharing. Moreover we are also only talking about encoding 'open'
>> data across disciplines (and the only things we are possibly allowing
>> away from PD is attribution and Share-alike).
>
> That's what *you* are endorsing. But by endorsing licenses, you can bet a
> lot of other licenses are going to be created with non commercial terms and
> more. It's kind of hard to criticize the incremental changes in licenses
> when you've endorsed the idea that we should encode norms in licenses,
> because then what you're doing is saying your norms are better than mine.
> Fragmentation.

The creation of data licenses with NC or other terms has happened (and
is going to go on) happening whatever we do. By specifying what we
mean by open (whether in norms and licenses) and encoding that in some
way is going very little different.

I also don't agree that we can't keep a decent, and clear, line here.

>> > norms allow for formal encoding but informal punishment. this
>> > flexibility, when we don't know how this will all play out, just seems
>> > a more wise choice to me. it lets experimentation hapen without fear
>> > of lawsuit, and lets each community encode its ideals without breaking
>> > the power to integrate it al.
>>
>> But a) there is flexibility in licenses (look at attribution with CC
>> licenses for content and b) what about the flip-side (non-compliance)
>
> In reality this doesn't get used. Non compliance in data to me is a feature,
> not a bug. I want a million kinds of data mashup a lot more than I want
> control. That's what it's going to take to deal with climate change and
> human health.

Dealing with climate change is more about politics that .... but
that's a separate matter ;)

I agree that we want freedom more than we want control (was I arguing
otherwise?) but this discussion is about something different:

a) Should the open data (in science) community include those who wish
to use share-alike or attribution
b) Should it include those who use licenses rather than norms

E.g. it's one thing for me to just use norms for my open stuff if I
want to but I don't see why others can't use a license if they want.

>> So I think we agree that whatever happens data access is going to be
>> covered by fairly formal documents that specify what you are supposed
>> to do. (So we should be clear that unlike most 'norms' these are going
>> to be fairly formalized).
>
> Absolutely. SC is actually going to prepare a formal norms document that
> looks and feels like a CC license, will have metadata, etc., that will
> encode attribution (though we're changing the name to citation to avoid the
> legal implications of attribution in a copyright sense) and share-alike
> (though we'll also change the name for the same reason). It will simply be
> empty in the middle.

I know Jordan's already got something at:

<http://www.opendatacommons.org/odc-community-norms/>

Perhaps that's usable already -- if not any comments on modifications
would be welcome ...




More information about the open-science mailing list