[open-science] feedback wanted on text-mining initiatives

Peter Murray-Rust pm286 at cam.ac.uk
Fri Apr 27 14:25:48 UTC 2012


On Fri, Apr 27, 2012 at 2:28 PM, Richard Kidd <KiddR at rsc.org> wrote:

> On Fri, Apr 27, 2012 at 1:40 PM, Richard Kidd <KiddR at rsc.org> wrote:****
>
> > > Among the things which we probably should not address are:
> > > * what can and cannot be mined and reproduced****
>
>
>
> Apols for the misunderstanding, my fault for using ‘open’ in a diff
> context - my request is that “what can and cannot be mined and reproduced” *
> *should** be discussed and addressed – as it’s the key issue. Am trying
> to get comments up on yr blog post but it’s not behaving…****
>
>
>

Thanks and understood. It is critical that is *is* discussed, and very
possibly on the OKF site.

The point here is to create a declaration about text-mining similar to
Budapest/Berlin/Bethesda [for Open Access]. They deliberately do not go
into details, but state a goal that can later be reified in law and
practice. They state what "Open Access" means in general terms. Phrases
like "for whatever purpose", "everybody", "without further permission".
They do NOT state that there should be a licence - licences are simply one
way of implementing them.

Let us call the Declaration of textmining the "Open Text Mining
Declaration". (It's slightly but not very contaminated by NPG's "Open
Text-mining Initiative" which most people have forgotten. It should be
brief - perhaps 2-3 lines at. It would define Open Text-mining...

"By Open textmining we mean ... everyone ... without further permission ...
available to all ...".

That does not mean that everyone must agree to do it. It is a goal. The BBB
declarations are not yet implemented universally. But they are the
yardstick that most of us use. They are particularly useful because so many
people and organisations create their own usage for "Open Access" without
defining it - thus causing confusion. We wish to avoid this for this new
field.

The details come second, and change as the world and technology changes. It
is generally agreed that CC-BY licences permit text- and other mining
without further permission. Contrast that with almost everything else where
nothing is clear.

If 20 scientists per university wish to text-mine that means 1000
universities * 20 scientists * 100 publishers == potentially 1 million
requests. The system can't cope. So the only ways forward are:
* refuse everything. There seem to be publishers who take this view. It has
the virtue of clarity
* permit everything. There are certainly publishers (BMC/PLoS) who take
this view
* leave everything unclear. "consult your librarian" "we'll discuss this
with our marketing people". That's the position for most publishers.

Fuzziness is destroying scientific progress and creating tensions. The OTMD
is an attempt to bring some clarity. Whether any given publisher accepts ,
rejects or ignores it is irrelevant to the wording of the declaration.

P.


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20120427/fb95db1b/attachment-0001.html>


More information about the open-science mailing list