[open-science] text-mining restrictions - a plea for more information

Thu Apr 28 20:58:38 UTC 2011

Hi All

Sorry, this is a little after the conversation has occured but I'm helping
Peter draft the text mining paper so I spent some time over the Easter
weekend looking up a few license agreements where they were available, in
order to turn our 'anecdotal evidence' into real exampes. There are quite a
lot of license agreements on the web (obviously without the contractual
details such as subscription fee, but that's not important for us). Some are
posted on the publishers site, others by institutions (well done, the Max
Plank Digital Library http://www.mpdl.mpg.de/services/ezb-readme_en.htm)

Anyhow, here is my very non-systematic collection of 10 license agreements
from some some of the larger academic publishers, including the CDL/Elsevier
one already mentioned (there are actually 11 entries, but 2 are the same
agreement).
https://spreadsheets.google.com/ccc?key=0AtV3tIqIu0UZdGVMNTAtejhBUlFySGk4QWdrVHJNdkE&hl=en&authkey=CKC-_LQP

3 don't mention data/text mining (which isn't to say there isn't an
additional clause in other versions of the agreement)
6 explicity forbid it via various means
1 probably forbids it but isn't quite so explicit.

There are lots more out there, but this is a start at least.

Jenny

On Sun, Apr 17, 2011 at 10:21 PM, Tom Moritz <tom.moritz at gmail.com> wrote:

> In that contracts, in addition to copyright,  represent a significant
> constraint
> on open access, the full exploration of such contractual conventions seems
> very useful to our common goal...
>
> Cliff suggests several clear paths to discovery of such contractual terms
> and
> the compilation of a corpus of such documents would seem a very helpful
> community resource...?
>
> The CDL/ Elsevier contract includes [@ "Schedule 1.2(a)
>  General Terms and Conditions  "RESTRICTIONS ON USAGE OF THE LICENSED
> PRODUCTS/ INTELLECTUAL PROPERTY RIGHTS"
> GTC1] "Subscriber shall not use spider or web-crawling or other software
> programs, routines, robots or other
>  mechanized devices to continuously and automatically search and index any
> content accessed online under this
>  Agreement. "
>
> Tom
>
> *Tom Moritz
> 1968 1/2 South Shenandoah Street,
> Los Angeles, California 90034-1208  USA
> +1 310 963 0199 (cell) [GMT -8]
> tommoritz (Skype)
> http://www.linkedin.com/in/tmoritz*
>
> “Πάντα ῥεῖ καὶ οὐδὲν μένει” (Everything flows, nothing stands still.) --*
> Heraclitus *
> "It is . . . easy to be certain. One has only to be sufficiently vague." --
> C.S. Peirce
> *"Il faut imaginer Sisyphe heureux."  ("One must imagine Sisyphus happy.")
> -- Camus
> *
>
>      Please consider the environment before printing this e-mail
>
>
>
>
> On Sun, Apr 17, 2011 at 4:06 PM, Clifford Lynch <cliff at cni.org> wrote:
>
>>  Two quick points on this.
>>
>> First, basically any of the contracts from state institutions in the US
>> are public record and can be obtained under state Freedom of Information act
>> laws. In addition, there is a move underway within the ARL libraries (both
>> public and private) to stop writing contracts with non-disclosure clauses;
>> there's a feeling that greater transparency, both in financial terms and in
>> terms of useage conditions and restrictions, is desirable. I believe that
>> there was a piece in the Chronicle of Higher Education a couple of weeks ago
>> discussing a Cornell position statement on this.
>>
>> But having said this, usually the terms that the publishers are trying to
>> keep secret are financial; as you say, if there's a prohibition on mass
>> downloading of articles, it's pretty useless if people in the institutional
>> community are not aware of it. I would suspect that if you contact your
>> university library and ask them about contractual restrictions on bulk
>> downloading or crawling, they'll be quite forthcoming.
>>
>> I believe that such clauses are pretty commonplace.  They often deal with
>> both crawling and also with downloading "significant" portions of the
>> journal article databases onto local faculty or student machines.
>>
>> Clifford Lynch
>> Director, CNI
>>
>>
>>
>>
>>
>>
>>
>> At 19:07 +0100 04/17/11, Peter Murray-Rust wrote:
>>
>> On Sun, Apr 17, 2011 at 3:38 PM, Vision, Todd J <tjv at bio.unc.edu> wrote:
>>
>> Peter's draft whitepaper on text-mining is badly needed and nicely put.  I
>> was particularly interested in this passage:
>>
>> "The provision of journal articles is controlled not only by copyright but
>> also (for most scientists) the contracts signed by the institution. These
>> contracts are usually not public. We believe (from anecdotal evidence) that
>> there are clauses forbidding the use of systematic machine crawling of
>> articles, even for legitimate scientific purposes."
>>
>>
>>
>> Thank you very much for giving me further encouragement.
>>
>>
>>
>>
>> We have also heard tell of the existence of such clauses, but also have
>> not been able to secure first-hand evidence for them.  It would be very nice
>> to promote this from "anecdotal" to "documented", and I would like here to
>> put out a wider plea for anyone who might be able to provide the language of
>> these contractual retrictions.  Alternatively, I would welcome suggestions
>> for how we are to know what exactly we are prohibited from doing in light of
>> the confidential nature of the contracts.
>>
>>  I will take the decidely unscientific step of assuming that this is
>> indepdnent confirmation and that we should take this further.
>>
>>
>>
>> If copyright holders really wish to enforce such restrictions, it seems
>> odd that their very existence is little more than a rumor. Can secret
>> restrictions be legally enforced?
>>
>>
>>
>> IANAL but I think this depends on the legal jurisdiction. We continually
>> hear of contracts in many areas of activity vwhere part of the contract is
>> that details may not be disclosed, so I expect it is legal. However I don't
>> know whether such gagging clauses are actually in force or whether not many
>> people are sufficiently interested to tell us.
>>
>>
>>
>> So there is one legal way to find out and I think it's appropriate. Before
>> doing it it would be very useful to have more confirmation, as if this is
>> well known I don't want to waste poeple's time.
>>
>>
>>
>> So, please, can we have rapid responses to this question before I (amd
>> possibly others) start stirring things yet again...
>>
>>
>>
>> P.
>>
>>
>>
>> --
>> Peter Murray-Rust
>> Reader in Molecular Informatics
>> Unilever Centre, Dep. Of Chemistry
>> University of Cambridge
>> CB2 1EW, UK
>> +44-1223-763069
>>
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>>
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20110428/30366c66/attachment-0001.html>