[open-bibliography] Fwd: [okfn-discuss] University of Ghent LibraryCatalogues: Open or Not?

Fri Mar 26 14:44:49 UTC 2010

Teaching the controversy is the best joke this running argument's had in ages!

As rufus noted, we do have a long time and lively, but friendly,
disagreement on these issues. I'm getting ready to hop onto a very
long flight and don't have time to compose a proper answer, but will
hammer out what I can.

Short version: data isn't the same as data isn't the same as data, and
it's unclear whether metadata are data. In many cases it simply
depends on the eye of the beholder. The SC protocol was a little
closed-minded on this front, and the PP are a recognition of that
(moving back to should from must in prescribing the public domain). In
some kinds of metadata (ontologies and such), we've tended to
encourage treating them as regular copyrightable object rather than
data - thus, when we talked to the New York Times about their news
vocabulary, they used CC BY rather than a data tool like CC0.  See
http://data.nytimes.com for an example of releasing a SKOS vocabulary.

But, when we look at library bibliographic data, we do tend to think
that data is itself not copyrightable, and encourage a panton
principles approach (as the University of Cologne adopted last week,
for example, in releasing 5.4M database records under PP-comliant
terms. http://www.hbz-nrw.de/dokumentencenter/presse/pm/datenfreigabe_engl

The semantic web thing will take me more time, but the short version
of that is: copyright licenses are triggered by the making of a copy
or a substantial extraction. Semantic web works by lots and lots of
tiny extractions creating lots and lots of aggregated resources.
Either you force everyone involved to replicate your licensing regime
for uses not contemplated by the licenses (i.e., no copying or
"substantial" extraction was involved, but since the end result is
that a thousands machines combing the data web can end up
re-assembling your data, you ask for rights on every query returned
out of the system) or you create a false perception that the data are
"protected" by the license. The NYT fellows started out thinking
share-alike was what they wanted, but after they went to see TimBL, he
sent them down the path of imposing less restrictions because it makes
for a better web of linked data if you don't have to add licensing
rules to every triple.

As with all things technical, your mileage may vary, and the important
thing in the end is that Rufus and I do agree on the most important
thing - sharing data is essential, however you do it.

jtw

On Fri, Mar 26, 2010 at 6:38 AM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> On 26 March 2010 04:38, John Wilbanks <wilbanks at creativecommons.org> wrote:
>> David,
>>
>> Thanks for copying me on this.
>>
>> It's obviously up to E-Prints, but I'd encourage them to look at the Panton
>> Principles, onto which both Creative Commons and OKF staff signed. They
>> recommend using the public domain (either via a CCZero tool or the OKF's
>> PDDL) on publicly funded science data, and it'd be a lost opportunity to
>> encode PP-compliant principles into a piece of software that will likely
>> contain a lot of publicly funded science data.
>
> As I understand it, this is a license for the eprints *metadata* not
> on scientific data itself. The Panton Principles are focused on the
> licensing of the data associated to (publicly-funded) research.
>
> In general, our primary recommendation is that data (any kind of data)
> be open as specified by http://opendefinition.org/. Being open in this
> sense includes public domain (e.g. CCZero or the PDDL) but also allows
> for attribution and sharealike type licenses -- the important point is
> that we can allow attribution and sharealike and still have
> interoperability.
>
> Obviously a PD approach places the fewest restrictions on re-users but
> one also has to think about the providers of the data here. For many
> people providing data attribution or sharealike provisions may be
> things they really want and which make the difference between them
> providing data and not providing it. (and community norms, IMO, don't
> make a real difference here [1]).
>
> [1]: http://blog.okfn.org/2009/02/02/open-data-openness-and-licensing/
>
> For example, I've heard from several providers of bibliographic data
> that they are especially concerned about sharing-back especially by
> commercial players in the sector. This is a legitimate concern.
> Share-alike providers a really valuable option here in simultaneously
> addressing this concern but not excluding commercial use (which could
> be very useful and important).
>
> I would also note that for many "grass-roots" projects sharealike also
> seems to be important with many of the most well-know grass-roots
> "open" projects using sharealike licenses (linux kernel, wikipedia,
> open street map etc).
>
> In the case of the Panton Principles they are dealing with "data
> related to published science", especially science that is publicly
> funded. In this area, the case for having absolutely minimal
> restrictions on reusers is a strong one (the science is already
> getting up-front funded by the state). This is why, at least from the
> OKF point of view, the Panton Principles go on from a basic
> recommendation of openness (principles 1-3) to an explicit
> recommendation of public domain only in principle 4.
>
> Here, we are talking about bibliographic metadata which is outside the
> scope of the PP.
>
>> I'd also point out that data license regimes that impose share-alike on data
>> can create expectations that 1) conflict with privacy restraints on data,
>> which often exist on publicly funded scientific data sets (anything touching
>> a human subject, for example) and 2) depend on the act of *copying*  - in a
>
> Share-alike can clearly have problems with privacy restraints -- or
> with any other constraints such as people wishing to proprietize the
> data :)
>
> In any case we are basically talking about bibliographic data here
> where privacy rights really aren't ever going to be an issue.
>
>> semantic web world, query is the dominant regime, rather than a copy. Last,
>
> I'm a little unclear about this query versus copy issue and why
> share-alike has a problem here.
>
>> of course, any contract regime that depends on contract carries the burden
>> of the derivative database providers making a contract offer and the users
>> agreeing to the contract, which over time can impose significant transaction
>> costs.
>
> I'm not sure of the relevance of this comment. Almost all licenses
> whether they are normal CC ones for content or Open Data Commons one's
> for data have an element of contract in them, at least in common law
> jurisdictions. See, for example, this comment from Mia Garlick (who
> was CC's general counsel):
>
> http://lists.ibiblio.org/pipermail/cc-licenses/2006-April/003504.html
>
> ODC licenses may be more explicit about this but they aren't really
> any different.
>
>> All of these went into the CC work on data licensing and the public domain,
>> and are reflected in the panton principles. We've done quite a bit of
>> legwork and legal analysis of the risks and benefits and would be happy to
>> brief you at your leisure. We arrived at our position after three years of
>> research, and after beginning with a real desire to implement a share-alike
>> regime for data.
>
> As John knows we don't share these views on licensing of data, at
> least in general :) For more on this see:
>
> <http://blog.okfn.org/2009/02/02/open-data-openness-and-licensing/>
> <http://blog.okfn.org/2009/02/09/comments-on-the-science-commons-protocol-for-implementing-open-access-data/>
>
> Rufus
>
>> On 3/25/10 5:41 PM, David Shotton wrote:
>>>
>>> Dear Christopher,
>>>
>>> There has been considerable discussion, I understand, between OKF and CC
>>> about data licenses, which may be still ongoing. CC is now recommending
>>> their "CCZero" license for datasets, since this avoids the potential
>>> problem of managing the "attribution stacking" that would result from
>>> automatic data aggregation from several sources using conventional
>>> attribution licenses of any kind. You are advised to take these
>>> arguments into consideration before deciding what to do for EPrints.
>>>
>>> For FlyTED (http://www.fly-ted.org), which as you know uses the EPrints
>>> software platform, we have, after discussions with John Wilbanks,
>>> adopted CCZero licenses for metadata and thumbnails, and normal CC
>>> attribution licences for high resolution images.
>>>
>>> Dryad (http://datadryad.org/repo), which is a repository for datasets
>>> linked to journal articles, has also adopted CCZero licenses.
>>>
>>>
>>>
>>> Christopher Gutteridge wrote:
>>>>
>>>> For reference, we're planning to release the next version of EPrints
>>>> with the following default configuration.
>>>> ---[snip---
>>>> # Licence to be inserted into all RDF data.
>>>>
>>>> # We suggest using Open Data Commons licenses as these are more sutiable
>>>> for
>>>> # data than normal Creative Commons. Follow the license URLs for more
>>>> # information.
>>>>
>>>> # $c->{rdf}->{license} ="http://www.opendatacommons.org/licenses/by/";
>>>> # OR
>>>> # $c->{rdf}->{license} ="http://www.opendatacommons.org/licenses/odbl/";
>>>>
>>>> # $c->{rdf}->{attributionName} = ".....";
>>>> # $c->{rdf}->{attributionURL} = ".....";
>>>> ---[snip---
>>>>
>>>> Suggestions are welcome... We release in a few days.
>>>>
>>>>
>>>> Rufus Pollock wrote:
>>>>
>>>>> On 25 March 2010 00:36, Mathias Schindler<mathias.schindler at gmail.com>
>>>>>  wrote:
>>>>>
>>>>>
>>>>>> On Mon, Mar 1, 2010 at 11:52 AM, Rufus Pollock<rufus.pollock at okfn.org>
>>>>>>  wrote:
>>>>>>
>>>>>>
>>>>>>> Thanks Lukas, that's incredibly useful to know and I'll try getting in
>>>>>>> touch with them direct.
>>>>>>>
>>>>>>>
>>>>>> The license information seems to have changed.
>>>>>>
>>>>>> http://lib.ugent.be/info/en/exports.shtml
>>>>>>
>>>>>>
>>>>> Indeed. After we found the exports, I was put in touch with Patrick
>>>>> Hochstenbach (now on this list) the man behind those exports. After a
>>>>> bit of discussion he kindly agreed to relicense under the
>>>>> http://opendefinition.org/  compliant ODbL.
>>>>>
>>>>> I meant to notify the list about this but obviously must have forgotten!
>>>>>
>>>>>
>>>>>
>>>>>> It is now "All Meercat exports with the Open Knowledge icon   are made
>>>>>> available under the Open Database License:
>>>>>> http://opendatacommons.org/licenses/odbl/1.0/." instead of some -nc
>>>>>> closeness.
>>>>>>
>>>>>> Sweet!
>>>>>>
>>>>>>
>>>>> Very much so :)
>>>>>
>>>>> Rufus
>>>>>
>>>>>
>>>>
>>>>
>>> --
>>>
>>> Dr David Shotton david.shotton at zoo.ox.ac.uk
>>> <mailto:mailto:david.shotton at zoo.ox.ac.uk>
>>> Reader in Image Bioinformatics
>>>
>>> Image Bioinformatics Research Group http://ibrg.zoo.ox.ac.uk
>>> Department of Zoology, University of Oxford tel: +44-(0)1865-271193
>>> South Parks Road, Oxford OX1 3PS, UK fax: +44-(0)1865-310447
>>>
>>
>
>
>
> --
> Open Knowledge Foundation
> Promoting Open Knowledge in a Digital Age
> http://www.okfn.org/ - http://blog.okfn.org/
>