[open-science] text-mining restrictions - a plea for more information

Tom Moritz tom.moritz at gmail.com
Thu Apr 28 22:56:04 UTC 2011


Thanks Jenny -- very helpful...
Tom

*Tom Moritz
1968 1/2 South Shenandoah Street,
Los Angeles, California 90034-1208  USA
+1 310 963 0199 (cell) [GMT -8]
tommoritz (Skype)
http://www.linkedin.com/in/tmoritz*

“Πάντα ῥεῖ καὶ οὐδὲν μένει” (Everything flows, nothing stands still.) --*
Heraclitus *
"It is . . . easy to be certain. One has only to be sufficiently vague." --
C.S. Peirce
*"Il faut imaginer Sisyphe heureux."  ("One must imagine Sisyphus happy.")
-- Camus
*

     Please consider the environment before printing this e-mail




On Thu, Apr 28, 2011 at 4:58 PM, Jenny Molloy <jcmcoppice12 at gmail.com>wrote:

> Hi All
>
> Sorry, this is a little after the conversation has occured but I'm helping
> Peter draft the text mining paper so I spent some time over the Easter
> weekend looking up a few license agreements where they were available, in
> order to turn our 'anecdotal evidence' into real exampes. There are quite a
> lot of license agreements on the web (obviously without the contractual
> details such as subscription fee, but that's not important for us). Some are
> posted on the publishers site, others by institutions (well done, the Max
> Plank Digital Library http://www.mpdl.mpg.de/services/ezb-readme_en.htm)
>
> Anyhow, here is my very non-systematic collection of 10 license agreements
> from some some of the larger academic publishers, including the CDL/Elsevier
> one already mentioned (there are actually 11 entries, but 2 are the same
> agreement).
>
> https://spreadsheets.google.com/ccc?key=0AtV3tIqIu0UZdGVMNTAtejhBUlFySGk4QWdrVHJNdkE&hl=en&authkey=CKC-_LQP
>
> 3 don't mention data/text mining (which isn't to say there isn't an
> additional clause in other versions of the agreement)
> 6 explicity forbid it via various means
> 1 probably forbids it but isn't quite so explicit.
>
> There are lots more out there, but this is a start at least.
>
> Jenny
>
>
>
> On Sun, Apr 17, 2011 at 10:21 PM, Tom Moritz <tom.moritz at gmail.com> wrote:
>
>> In that contracts, in addition to copyright,  represent a significant
>> constraint
>> on open access, the full exploration of such contractual conventions seems
>> very useful to our common goal...
>>
>> Cliff suggests several clear paths to discovery of such contractual terms
>> and
>> the compilation of a corpus of such documents would seem a very helpful
>> community resource...?
>>
>> The CDL/ Elsevier contract includes [@ "Schedule 1.2(a)
>>  General Terms and Conditions  "RESTRICTIONS ON USAGE OF THE LICENSED
>> PRODUCTS/ INTELLECTUAL PROPERTY RIGHTS"
>> GTC1] "Subscriber shall not use spider or web-crawling or other software
>> programs, routines, robots or other
>>  mechanized devices to continuously and automatically search and index any
>> content accessed online under this
>>  Agreement. "
>>
>> Tom
>>
>> *Tom Moritz
>> 1968 1/2 South Shenandoah Street,
>> Los Angeles, California 90034-1208  USA
>> +1 310 963 0199 (cell) [GMT -8]
>> tommoritz (Skype)
>> http://www.linkedin.com/in/tmoritz*
>>
>> “Πάντα ῥεῖ καὶ οὐδὲν μένει” (Everything flows, nothing stands still.) --*
>> Heraclitus *
>> "It is . . . easy to be certain. One has only to be sufficiently vague."
>> -- C.S. Peirce
>> *"Il faut imaginer Sisyphe heureux."  ("One must imagine Sisyphus
>> happy.") -- Camus
>> *
>>
>>      Please consider the environment before printing this e-mail
>>
>>
>>
>>
>> On Sun, Apr 17, 2011 at 4:06 PM, Clifford Lynch <cliff at cni.org> wrote:
>>
>>>  Two quick points on this.
>>>
>>> First, basically any of the contracts from state institutions in the US
>>> are public record and can be obtained under state Freedom of Information act
>>> laws. In addition, there is a move underway within the ARL libraries (both
>>> public and private) to stop writing contracts with non-disclosure clauses;
>>> there's a feeling that greater transparency, both in financial terms and in
>>> terms of useage conditions and restrictions, is desirable. I believe that
>>> there was a piece in the Chronicle of Higher Education a couple of weeks ago
>>> discussing a Cornell position statement on this.
>>>
>>> But having said this, usually the terms that the publishers are trying to
>>> keep secret are financial; as you say, if there's a prohibition on mass
>>> downloading of articles, it's pretty useless if people in the institutional
>>> community are not aware of it. I would suspect that if you contact your
>>> university library and ask them about contractual restrictions on bulk
>>> downloading or crawling, they'll be quite forthcoming.
>>>
>>> I believe that such clauses are pretty commonplace.  They often deal with
>>> both crawling and also with downloading "significant" portions of the
>>> journal article databases onto local faculty or student machines.
>>>
>>> Clifford Lynch
>>> Director, CNI
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 19:07 +0100 04/17/11, Peter Murray-Rust wrote:
>>>
>>> On Sun, Apr 17, 2011 at 3:38 PM, Vision, Todd J <tjv at bio.unc.edu> wrote:
>>>
>>> Peter's draft whitepaper on text-mining is badly needed and nicely put.
>>>  I was particularly interested in this passage:
>>>
>>> "The provision of journal articles is controlled not only by copyright
>>> but also (for most scientists) the contracts signed by the institution.
>>> These contracts are usually not public. We believe (from anecdotal evidence)
>>> that there are clauses forbidding the use of systematic machine crawling of
>>> articles, even for legitimate scientific purposes."
>>>
>>>
>>>
>>> Thank you very much for giving me further encouragement.
>>>
>>>
>>>
>>>
>>> We have also heard tell of the existence of such clauses, but also have
>>> not been able to secure first-hand evidence for them.  It would be very nice
>>> to promote this from "anecdotal" to "documented", and I would like here to
>>> put out a wider plea for anyone who might be able to provide the language of
>>> these contractual retrictions.  Alternatively, I would welcome suggestions
>>> for how we are to know what exactly we are prohibited from doing in light of
>>> the confidential nature of the contracts.
>>>
>>>  I will take the decidely unscientific step of assuming that this is
>>> indepdnent confirmation and that we should take this further.
>>>
>>>
>>>
>>> If copyright holders really wish to enforce such restrictions, it seems
>>> odd that their very existence is little more than a rumor. Can secret
>>> restrictions be legally enforced?
>>>
>>>
>>>
>>> IANAL but I think this depends on the legal jurisdiction. We continually
>>> hear of contracts in many areas of activity vwhere part of the contract is
>>> that details may not be disclosed, so I expect it is legal. However I don't
>>> know whether such gagging clauses are actually in force or whether not many
>>> people are sufficiently interested to tell us.
>>>
>>>
>>>
>>> So there is one legal way to find out and I think it's appropriate.
>>> Before doing it it would be very useful to have more confirmation, as if
>>> this is well known I don't want to waste poeple's time.
>>>
>>>
>>>
>>> So, please, can we have rapid responses to this question before I (amd
>>> possibly others) start stirring things yet again...
>>>
>>>
>>>
>>> P.
>>>
>>>
>>>
>>> --
>>> Peter Murray-Rust
>>> Reader in Molecular Informatics
>>> Unilever Centre, Dep. Of Chemistry
>>> University of Cambridge
>>> CB2 1EW, UK
>>> +44-1223-763069
>>>
>>>
>>> _______________________________________________
>>> open-science mailing list
>>> open-science at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/open-science
>>>
>>>
>>>
>>> _______________________________________________
>>> open-science mailing list
>>> open-science at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/open-science
>>>
>>>
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20110428/35b17779/attachment-0001.html>


More information about the open-science mailing list