[open-science] Openness and Licensing of (Open) Data

John Wilbanks wilbanks at creativecommons.org
Mon Feb 9 15:12:08 UTC 2009


>
>
> > 1. it is easy to imagine that in 14 we have very valuable data web,
> > and that a db under the eu directive is in there and the owner wants
> > value now. a lawsuit under ip has the power to bring down lots of
> > other stuff. this is a very plausible future under a licensing regime
> > but impossible under a normative regime.
>
> Not sure I follow here (perhaps typos).


What I mean is that in 14 years, a lawsuit can be filed under a license that
can bring down significant federations of data into which that data was
integrated. That lawsuit cannot be filed under a normative system - if a
community wants to punish one of its own members it can do so using its own
systems, but the legal status of the data web is secure.


> > 2 the involuntary infringer is liable for damages under licenses, even
> > if she was not the person who removed attribution or sharealike
> > clauses two steps back in the data web. if there is monetary value to
> > be had, this is going to happen. endorsing open licenses = endorsing
> > enforcement, even is the intent is not to go to court. and if we don't
> > plan to go to court, why even allow the chance for this?
>
> Because not planning does not mean it never happening (or the threat
> of it happening not being important).


You're assuming good actors use the open licenses. I'm assuming all kinds of
actors use them. It's so hard to figure out where copyright stops and starts
in a database that it's pretty easy to see scenarios where people assume the
human genome is copyrighted, or map data is copyrighted, and sue for failed
attribution. The strict liability nature of copyright is a real problem
here. We're not talking anything like software, where you *have* to have
copyright licenses. We're choosing to put them in places where they don't
natively belong. And what we are doing is creating a world in which this
kind of lawsuit is plausibly predictable. Why? Given that the data licenses
are much less enforceable from a copyright perspective than the software
licenses (again due to the underlying nature of the content) why is this
worth the so easily seen risk?

Stallmann and Mako taught me this. If you can enable an evil use with your
tools, you are obliged to contemplate it. It's an important aspect at all
times. PD + norms mean that someone can take a copy of a database private,
and they don't have to share their stuff under the law. But they also mean
that anyone else in the world can keep a copy of that database online and
public. No one's use can enclose anyone else's use for any purpose and bad
actors can't sue when they suddenly realize there's a valuable data web.

Remember that every google query in a data web is a derivative data product,
and attribution will be legally required with preliminary injunction power
available to every searched database under a license scheme. That's the
outcome of licensing choices.


>
> I also really don't understand the logic here. Court only happens due
> to enforcement when people don't obey (and intentionally -- it is
> extremely unlikely you are going to get in trouble for inadvertantly
> failing to comply ...). I.e. it is important precisely when 'norms'
> aren't cutting it ...
>
> It seems to me that that the status of a the GPL as a license rather
> than a norm has been important in getting compliance from some people
> over time -- and more importantly in encouraging others to contribute
> to GPL'd projects in the first place


Stallman had no choice but to use copyrights. We have a choice. He had to
break down a culture of closure. We have a culture of openness. We have a
chance to do things differently, to not built a world based on ideas of
control and contract, to encode a world that in many ways already exists.


> > 3. fragmentation. data disciplines vary much more than software. their
> > norms swing wildly, as do secondary regimes like privacy. creating the
> > idea that one or two licenses suffice doesn't reflect this - it will
> > force disciplines to choose between their norms and technolegal
> > interoperability. and if the norms all get encoded in their own
> > licenses we will indeed see fragmentation. this scenario also means
> > that cross discipline mashup will be legally difficult.
>
> I am not sure I really buy this. Code gets used across a lot of different
> areas


Physicists have different norms than biologists, who have different norms
than anthropologists. Trust me, not everyone in science is going to use a
map license. Choosing the encoding of norms in contract in the sciences
essentially guarantees that we will have legal fragmentation, because the
norms differ wildly across the disciplines.

Code is different. It gets used all over the place, but the invisible
college of programming connects the writers. The norms of programming are
encoded in the licenses. But the clinical research scientist and the high
energy physicist and the chemist and the biologist and the sensor-net
geospatial scientist...well, they're different. And if their norms get
encoded in law, and in different kinds of share alike, the kind of data
integration we need for climate change is going to be a hell of a lot
harder.


>
> Plus we are not talking about the whole governance of a discipline but
> data sharing. Moreover we are also only talking about encoding 'open'
> data across disciplines (and the only things we are possibly allowing
> away from PD is attribution and Share-alike).
>

That's what *you* are endorsing. But by endorsing licenses, you can bet a
lot of other licenses are going to be created with non commercial terms and
more. It's kind of hard to criticize the incremental changes in licenses
when you've endorsed the idea that we should encode norms in licenses,
because then what you're doing is saying your norms are better than mine.
Fragmentation.

>
> > norms allow for formal encoding but informal punishment. this
> > flexibility, when we don't know how this will all play out, just seems
> > a more wise choice to me. it lets experimentation hapen without fear
> > of lawsuit, and lets each community encode its ideals without breaking
> > the power to integrate it al.
>
> But a) there is flexibility in licenses (look at attribution with CC
> licenses for content and b) what about the flip-side (non-compliance)
>

In reality this doesn't get used. Non compliance in data to me is a feature,
not a bug. I want a million kinds of data mashup a lot more than I want
control. That's what it's going to take to deal with climate change and
human health.

>
> So I think we agree that whatever happens data access is going to be
> covered by fairly formal documents that specify what you are supposed
> to do. (So we should be clear that unlike most 'norms' these are going
> to be fairly formalized).
>

Absolutely. SC is actually going to prepare a formal norms document that
looks and feels like a CC license, will have metadata, etc., that will
encode attribution (though we're changing the name to citation to avoid the
legal implications of attribution in a copyright sense) and share-alike
(though we'll also change the name for the same reason). It will simply be
empty in the middle.

>
> Leaving aside the fact that there are going to patent-related issues
> in this stuff anyway (pharma, business methods etc). Also I don't
> understand how 'us' (the open data) community not using licenses and
> IP prevents everyone else from doing so? If you are worried about
> 'trolling' in data then it is going to happen whether we use licenses
> or norms or ....
>

Trolling can't happen on PD data. We can't fix the patent problem in
copyright or data licenses, sadly - disclosure and crowd review of prior art
is the best way i've seen - but if we can create a zone of unambiguously
free data, and use metadata to exclude unfree data, then trolling is
*eliminated*. You can't use a PD assertion on data and then sue for failed
attribution. That's the fundamental benefit as I see it...


>
> Regards,
>
> Rufus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20090209/3c5cd4b4/attachment-0001.html>


More information about the open-science mailing list