[Open-access] Fwd: Text and information-mining from content in Wiley journals

Naomi Lillie naomi.lillie at okfn.org
Wed Apr 25 14:16:25 UTC 2012


Response from, and correspondence with, Wiley



---------- Forwarded message ----------
From: Peter Murray-Rust <pm286 at cam.ac.uk>
Date: Fri, Mar 16, 2012 at 5:28 PM
Subject: Re: Text and information-mining from content in Wiley journals
To: "Campbell, Duncan - Oxford" <dcampbell at wiley.com>
Cc: "pk219 at cam.ac.uk" <pk219 at cam.ac.uk>, "emc59 at cam.ac.uk" <emc59 at cam.ac.uk>,
Peter Morgan <pbm2 at cam.ac.uk>




On Fri, Mar 16, 2012 at 11:33 AM, Campbell, Duncan - Oxford <
dcampbell at wiley.com> wrote:

>  Peter
>
> Thanks for your note. Please feel free to publish our correspondence on
> your blog.
>
> As I said in my original mail, we would welcome the opportunity to work
> with you on a specific project that would enable us to gain further
> experience in enabling and supporting text and data mining of
> Wiley-published content.
>
> I’d certainly be willing to discuss a license for a pilot TDM project with
> a subset of our journals in order to establish how best we can enable
> access to our content for mining purposes. I think we would also need to
> involve the UL in any discussions, as they are the license-holder for Wiley
> content subscribed to by Cambridge.
>
> I’d also like to get a better understanding of how you plan on processing
> content (i.e. what you mean when you say ‘extract all the chemical facts
> and do research on them’), and in particular how the outputs of that
> processing will be distributed (i.e. what you mean when you say you want to
> be able to ‘publish the data on which the science is based’).
>
> You mention that you are keen to start on projects straight away, so
> please do let me know the specifics of a project you would like to work on
> and we can then take this forward.
>
>
Thanks

I am pulling all this together at the weekend, I hope.

One immediate observation is that it is a lot of work to manage several
publishers all with different APIs. We've had to do this with
crystallography and write a different subcrawler for each journal. If all
publishers exposed through - say - ATOM (or OAI-PMH) it would be a lot
easier technically



> I look forward to hearing from you.
>
> Duncan
>
>
>
>
> Duncan Campbell
> Wiley-Blackwell
>
>  *From*: Peter Murray-Rust [mailto:pm286 at cam.ac.uk]
> *Sent*: Friday, March 09, 2012 05:07 PM
>
> *To*: Campbell, Duncan - Oxford
> *Cc*: Patricia Killiard <pk219 at cam.ac.uk>; Edmund Chamberlain <
> emc59 at cam.ac.uk>
> *Subject*: Re: Text and information-mining from content in Wiley journals
>
>
>
> On Fri, Mar 9, 2012 at 4:16 PM, Campbell, Duncan - Oxford <
> dcampbell at wiley.com> wrote:
>
>> Peter****
>>
>> **
>>
>
> I've copied in my colleagues in the Cambridge University Library
>
>>  **
>>
>> Thanks for getting in touch. We would be happy to discuss your specific
>> requirements for text-mining Wiley content, and how we can work with you to
>> enable mining in a mutually-acceptable manner.****
>>
>> **
>>
>
> Excellent. You'll appreciate that this is a matter of great public
> interest at present and an opportunity to show how helpful publishers are,
> so I'll be posting the correspondence on my blog.
>
> I don't have specific requirements. I have the technology to extract facts
> from Wiley publications and do scientific research on them and I'd like to
> do that. In the first instance I'll analyze which journals contain
> chemistry and extract all the chemical facts and then do research on them.
> Since the data are factual there is no question of copyright being violated.
> As our group is the leading creator of Open Source information-mining
> software for chemistry and we are regarded as among the world's experts I
> have a large number of collaborators. There are a large number of projects
> already but we add at least one a week so there's no point in burdening you
> with the details. Here are just 5 to show you the power.
>
>    - scanning the literature for potential antimalarial compounds (Mat
>    Todd). We have to search for every compound as there is no golden rule for
>    finding drugs against this killer disease
>    - finding second harmonic generators for solar panels, leading to
>    increased energy efficiency and greenness for the planet
>    - Computing the human metabolome. Again we have to find all instances
>    where compounds have been mentioned that might be human metabolites
>    - Improving the eco-friendliness of chemical reactions. What solvents
>    have been used in what reactions? Can we use solvents that are more
>    friendly to the planet. Again we need to look at every reaction.
>    - Improving the accuracy of computaional chemistry. There are billions
>    of dollars spent on trying to predict the structure of matter. We want to
>    find every paper and find the most cost effective methods
>
>  There are also many added benefits in scientific information-mining
> research itself where I am an acknowledged world expert (sorry to sound
> boastful, it's just to assure you I know what I'm doing).
>
> I'm not asking you to get involved in any of the technical details and we
> don't need any special technology from the publisher, any special versions
> of the articles or any APIs. There is no need to involve CUL in details.
> All we need is:
>
>
>    1. To download and analyze, using machines, papers from Wiley journals
>    to which we have subscriptions (we use web-friendly crawling protocols)
>    2. An assurance from Wiley that you will not impose technical and
>    legal/contractual barriers.
>    3. To be able to publish the data on which the science is based
>    (science without data is almost worthless as you know)
>
> We give you an assurance that we shan't deliberately publish any copyright
> material such as the complete verbatim Version of Record.
>
>>  **
>>
>> We are keen to enhance the usage of our journal content by encouraging
>> text and data mining, and welcome the opportunity to work on a specific
>> project with you that would enable us to gain further experience in this
>> area.  As you’ll appreciate, at this stage there are still questions around
>> access, processing and distribution of the outputs of text mining, which
>> Wiley, in common with most other STM publishers, is working through. ****
>>
>> ** **
>>
>> I look forward to hearing from you further.****
>>
>> **
>>
>
> There is an urgency. We are keen to start some of these projects within a
> day or two as we want to present to the Hargreaves enquiry how valuable
> text-mining can be. We therefore only need from you an assurance that we
> can employ factual mining and to get into the report we'll need this by
> 2012-03-14. I am afraid promises of intent are worthless at this stage.
> There is only one acceptable answer:
>
> YES - you can go ahead without further permission from Wiley
>
> anything else, I'm afraid will be a NO for Hargreaves.
>
>
>
>> **
>>
>> Duncan****
>>
>> ** **
>>
>> ** **
>>
>> *From:* peter.murray.rust at googlemail.com [mailto:
>> peter.murray.rust at googlemail.com] *On Behalf Of *Peter Murray-Rust
>> *Sent:* 09 March 2012 09:34
>> *To:* Campbell, Duncan - Oxford
>> *Subject:* Fwd: Text and information-mining from content in Wiley
>> journals****
>>
>> ** **
>>
>> I gather that Bob(sic) Campbell has copied the following message to you,
>> but I haven't heard back. Please can I ask you to respond to it?
>>
>> Thanks
>>
>> Peter ****
>>
>> ---------- Forwarded message ----------
>> From: *Peter Murray-Rust* <pm286 at cam.ac.uk>
>> Date: Wed, Mar 7, 2012 at 8:40 AM
>> Subject: Text and information-mining from content in Wiley journals
>> To: Robert Campbell <bcampbel at wiley.com>
>>
>>
>> Dear Bob Campbell,
>> We were at the meeting last week in Oxford on the "Evolution of
>> Scholarship" where you stated that anyone could mine content in Wiley
>> journals for factual information, and re-use and republish it. Cambridge
>> subscribes to many Wiley journals and I and many other scientists wish to
>> mine factual information using machines.
>>
>> We cannot do this at present as Wiley imposes two barriers:****
>>
>>    -  legal restrictions of text-mining through contracts (Wiley has in
>>    the past threatened scientists with legal action for extracting facts)
>>    ****
>>    - Wiley's server-side robots which will shut off the University if we
>>    attempt to download publications automatically.****
>>
>> *I would therefore like you (immediately, as we wish to start
>> immediately) to confirm that Wiley will absolutely and for ever allow
>> subscribers, at no additional cost, to mine all content for facts in both
>> back issues and current publications as soon as they appear.* ****
>>
>> Answeriing "YES" to this question is all that is required. Any other
>> answer, including the request for discussion will be taken as "NO". Please
>> reply by the end of today (2012-03-07).****
>>
>> I have published a background document (
>> http://blogs.ch.cam.ac.uk/pmr/2012/03/04/information-mining-and-hargreaves-i-set-out-the-absolute-rights-for-readers-non-negotiable/) which also gives a wide range of illustrations of factual information. In
>> places it reads "Elsevier", please substitute "Wiley".****
>>
>> Please note that we do not need any help from Wiley in systematically
>> downloading papers. We shall use a delay of 1 second between downloads and
>> we shall not re-publish verbatim the papers we download.****
>>
>> Thank you and I look forward to your immediately reply and agreement.****
>>
>> --
>> Peter Murray-Rust
>> Reader in Molecular Informatics
>> Unilever Centre, Dep. Of Chemistry
>> University of Cambridge
>> CB2 1EW, UK
>> +44-1223-763069****
>>
>>
>>
>>
>> --
>> Peter Murray-Rust
>> Reader in Molecular Informatics
>> Unilever Centre, Dep. Of Chemistry
>> University of Cambridge
>> CB2 1EW, UK
>> +44-1223-763069****
>>  ------------------------------
>> Blackwell Publishing Limited is a private limited company registered in
>> England with registered number 180277.
>> Registered office address: The Atrium, Southern Gate, Chichester, West
>> Sussex, United Kingdom. PO19 8SQ.
>> ------------------------------
>>
>>
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>  ------------------------------
> Blackwell Publishing Limited is a private limited company registered in
> England with registered number 180277.
> Registered office address: The Atrium, Southern Gate, Chichester, West
> Sussex, United Kingdom. PO19 8SQ.
> ------------------------------
>
>


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069



-- 
Naomi Lillie
Foundation Administrator and Community Coordinator (Open Bibliography)
Open Knowledge Foundation
http://okfn.org/
Skype: n.lillie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-access/attachments/20120425/9fcea386/attachment.html>


More information about the open-access mailing list