[Open-access] Fwd: Information mining RSC articles

Naomi Lillie naomi.lillie at okfn.org
Wed Apr 25 14:44:20 UTC 2012


PMR initial e-mail to, and response from, Royal Society of Chemistry (see
also
http://blogs.ch.cam.ac.uk/pmr/2012/03/13/permission-for-information-mining-update-and-response-from-royal-society-of-chemistry/
)



---------- Forwarded message ----------
From: Peter Murray-Rust <pm286 at cam.ac.uk>
Date: Mon, Mar 12, 2012 at 2:53 PM
Subject: Re: Information mining RSC articles
To: Richard Kidd <KiddR at rsc.org>
Cc: Patricia Killiard <pk219 at cam.ac.uk>, Edmund Chamberlain <emc59 at cam.ac.uk>,
Peter Morgan <pbm2 at cam.ac.uk>, Antony Williams <WilliamsA at rsc.org>, David
James <JAMESD at rsc.org>


Thank you Richard.

On Mon, Mar 12, 2012 at 2:28 PM, Richard Kidd <KiddR at rsc.org> wrote:

>  Dear Peter
>
>
>
> Thanks for your request. It’s good to see from this and the accompanying
> blog post you still have some positive memories of text mining with some
> publishers. So far, we have mainly supplied articles for academic text
> mining purposes as one-off deliveries – such as for the SESL project, and
> the 50,000+ articles we supplied to both the ChETA and TREC Chem projects.
> Often it is easier for miners to bulk load within their own systems than
> crawling to collect, but we recognise that times are changing.
>
>
>
> We ask you talk to your librarian colleagues, both in terms of them being
> happy with what you’re doing under the agreed licenses with RSC, and so
> they understand what ongoing value the results of any mining exercise
> derives from the RSC subscription.
>

This is a relative clear answer. It puts the onus on the librarians rather
than RSC (who drafted the licence) and it prevents useful scale. Every
research in every institution has to find the right librarian to ask and
hope there is a consistency. In fact librarians almost always err against
offending copyright so will say no.

>
>
> This ongoing value issue is important in terms of text mining implications
> for us. Along with most publishers we supply counter stats to librarians of
> usage within their institution – and, as you know, when renewal times comes
> these are used to judge which journals are of most value. Our concern is if
> the mining extracts and republishes sufficient content from the
> publications as to reduce apparent usage (and citation) of the published
> papers in future. At the moment full text downloads are the major measure
> we have (rightly or wrongly in principle) for the librarian to judge if
> publications are of value to the institution, and republication of
> extracted facts and data at least potentially could affect this. Done
> right, the effect can be positive, but it could also be detrimental.
>
>
I understand the argument. The same could be said of Chemical Abstract and
Beilstein / Reaxys. We aren't proposing anything radically different..

>
>
> Some of Cameron’s suggested principles of research data mining would have
> been a valuable addition to your proposed non-negotiables, to reduce
> concerns that future derived would reduce usage of the original papers by
> your institution and others:
>
> * Always link back to the version of record of the research output you
> have mined.
>

That's fair.


>  * Include elements and snippets by reference, not by value.
>

This fails whenever the publisher changes their URL system


>  Restrict content replication to that reasonably allowed by Fair Use
> provisions or enabled by licences, and required for efficient services
>

This is the problem. There is no fair use in this jurisdiction - it is by
consent or challenges in court.


> * Only redistribute content where copyright terms explicitly allow it
>

Same problem. Copyright terms don't cover this area explicitly


>  * Respect API service limits where posted and develop polite tooling
> with exponential back-off where appropriate
>

Agreed. Again some public information on this would be useful.


>  (a couple of principles deleted, due to non-relevance to this specific
> question rather than disagreement)
>
>
>
> Finally, a correction. You say we cut off access a  few years ago. My
> recollection is slightly different and I have the correspondence if you’d
> like to  see it, from 2006. We didn’t cut you off, though we suggested we
> would block one IP address if the downloading continued without any
> contact. We discussed it amicably – explanation made it clear and the
> download behaviour was modified for both sides to be happy with
> continuation. But it’s an excellent illustration of why we appreciate being
> asked about the approach – as in this case the downloader was trying to
> retrieve non-existent issues, filling our developers’ mailboxes with 404
> alerts. So while you think we’re only concerned about server load with
> on-demand mining, you can end up killing other systems we have to improve
> customer service. Mike Taylor clearly values publishers who try to stay on
> top of broken links ;-)
>
>
Let's agree to park this - our memories diifer.

>
>
> I would also ask that you include our response verbatim if you are using
> it in any of your Hargreaves submissions, and of course we will be
> preparing our own submission.
>
>
We will like to this - there may be a limit on size

>
>
> In summary, we would strongly appreciate discussion on the extent of the
> factual information you intend to republish (I have seen the examples on
> the blog), together with the involvement of your librarian colleagues in
> the process – for current agreements, and effects on future usage and value
> measures.
>
>
>
The librarians have been copied in to all this correspondence

Best wishes
>
>
>
> Richard
>
>
P.


>
>
>
>
> *From:* peter.murray.rust at googlemail.com [mailto:
> peter.murray.rust at googlemail.com] *On Behalf Of *Peter Murray-Rust
> *Sent:* 10 March 2012 08:13
> *To:* David James; Richard Kidd
> *Cc:* Patricia Killiard; Edmund Chamberlain; Peter Morgan; Antony Williams
> *Subject:* Information mining RSC articles
>
>
>
> David, Richard,
> We are preparing a response to the Hargreaves report about information
> mining from scientific publications. As you know we have developed a world
> class set of Open Source tools for chemical information extraction, some of
> them with your support - for which public thanks!
>
> We are now in the position where we can extract factual chemical
> information from the full text of articles with high precision and recall
> (OPSIN accuracy is > 99.5% and recall > 95%) and with great speed and
> cost-effectiveness. The University of Cambridge is a subscriber to RSC
> journals and we would like to begin to extract information on a systematic
> basis for Open scientific research. We don't need technical help or
> permission from the RSC. We have copied Cambridge University Library staff.
>
> This mail is to ask your assurance that we can do this without (a)
> legal/contractual barriers from RSC and (b) that we shall not be cut off by
> RSC robots (unfortunately this happened some years ago). We wish to start
> immediately to show Hargreaves the benefit of information mining - they
> have a deadline for 2012-03-21 so we would like your agreement by
> 2012-03-15. All we require is:
>
> *YES: you may mine and publish factual information from RSC journals
> without additional payment and without restriction from legal and technical
> barriers.
> *
> I hope you can trust me to act responsibly on not violating copyright and
> being considerate to your robots. I have set out more details and a
> non-exhaustive illustration of facts in
> http://blogs.ch.cam.ac.uk/pmr/2012/03/04/information-mining-and-hargreaves-i-set-out-the-absolute-rights-for-readers-non-negotiable.
>
> Unfortunately any other reply than YES by 2012-03-15 will be regarded as
> unacceptable for the purposes of Hargreaves.
>
> You will note that we are also approaching other major publishers of
> chemistry. Elsevier has already publicly said we can mine their content for
> research and we'll be publishing the facts under an Open licence. This
> means that Chemspider (Tony Williams copied) can immediately use all this
> information in the Chemspider resource.
>
> Best wishes,
>
> Peter
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>
> DISCLAIMER:
>
> This communication (including any attachments) is intended for the use of
> the addressee only and may contain confidential, privileged or copyright
> material. It may not be relied upon or disclosed to any other person
> without the consent of the RSC. If you have received it in error, please
> contact us immediately. Any advice given by the RSC has been carefully
> formulated but is necessarily based on the information available, and the
> RSC cannot be held responsible for accuracy or completeness. In this
> respect, the RSC owes no duty of care and shall not be liable for any
> resulting damage or loss. The RSC acknowledges that a disclaimer cannot
> restrict liability at law for personal injury or death arising through a
> finding of negligence. The RSC does not warrant that its emails or
> attachments are Virus-free: Please rely on your own screening. The Royal
> Society of Chemistry is a charity, registered in England and Wales, number
> 207890 - Registered office: Thomas Graham House, Science Park, Milton Road,
> Cambridge CB4 0WF
>



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069




-- 
Naomi Lillie
Foundation Administrator and Community Coordinator (Open Bibliography)
Open Knowledge Foundation
http://okfn.org/
Skype: n.lillie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-access/attachments/20120425/560ccc7f/attachment.html>


More information about the open-access mailing list