[open-science] [SCHOLCOMM] Libre open access, copyright, patent law, and, other intellectual property matters

Peter Murray-Rust pm286 at cam.ac.uk
Thu Mar 22 17:12:10 GMT 2012

On Thu, Mar 22, 2012 at 3:35 PM, john wilbanks <jtw at del-fi.org> wrote:

> I realize that I didn't make my point clear enough actually.
> And I don't lump Heather in with Harnard. Heather asked a good question
> that I answered obliquely. For that I apologize.

I was probably simplistic as well.  But, unfortunately, there is a large
section of academia that creates "Open Access" policy implicitly and
explicitly and very little of it is informed by scientists. There are 1000+
institutional repositories (Peter Suber figure) with ca 2 FTE per repo == >
2000 FTEs and very little of this investment is informed by scientist needs.

In my submission to the Hargreaves process (
I have said very much the same as John

> I do not just want the ability for academics to text mine.


> I want there to be a robust market for text mining that includes companies
> who mine open access content for their own reasons as well as academics,
> and I want there to be a robust market of startups who provide those text
> mining services (and thus must make and distribute copies of corpuses as
> validation sets, as part of collaborations with academics that improve
> algorithms, and who also produce and sell the outputs of text mining).
> Right now text mining pretty much sucks, frankly, compared to what it ought
> it be.

Totally agreed. I have wasted 3 years of my research life.

> Non commercial licenses are not just a way to prevent other publishers
> from reselling content, which is often the focus of the conversation, but a
> tax on startups and companies who want to treat the literature as data.
> Here's a short list of companies trying to do just that who are being
> hamstrung by closed access, and who would be blocked under NC terms:
> Personalized Medicine (providing auto-annotation of genotypes to doctors'
> offices), Selventa (providing auto-created hypotheses explaining high
> throughput experimental biological data), Ingenuity (providing large
> databases of assertions specific to diseases or tissues). Those three are
> simply the first ones that jump to mind in startup land. There's ~20 more I
> know of, and many more that I don't.

Exactly. The whole point is that a bright small group can create great
tools and information in months. 3 of our group have done this - but none
can be properly deployed as we have to fight restrictive practices.

> The uncertainty around content chills venture investment, to boot. If the
> web had been NC licensed, we would not have google, and pagerank would have
> remained where it started as an academic theory experiment. That would
> suck, in my opinion. And the big publishers know this, which is precisely
> why they add clauses that ban mining to existing licenses and want
> commercial restrictions. I don't think they're worried about resale. I
> think they're worried about getting their lunch eaten by new entrants who
> see the market differently, as Apple did with music, as Google did to
> Microsoft (and in turn Facebook did to Google). That's why Elsevier has an
> entire unit devoted to this stuff, run by extremely smart people.

Yes. It is a serious mistake to assume Elsevier is stupid or incompetent. I
suspect they are unprepared in some technical areas as they hope to avoid
having to deploy

> Then there's all of big pharma and biotech, who all maintain libraries and
> subscriptions, but are often absent from these discussions because of their
> position on patents.
> Non commercial restrictions have *side effects* that are bad for
> innovation and bad for science. We need entrepreneurs and not just
> academics.

And they lead to bad  decision making at all levels of science - the
information doesn't get through. People often need the literature to be
push'ed to them, not just pull'ed. You cannot push NC material and you
cannot push Green material. It's only used by those who know

> This is not nearly as much of a problem in the humanities on first blush,
> but the reality is that as text mining gets better, faster, cheaper, and
> more subtle in the hard sciences, it will bring amazing tools to the
> humanities as well.
> Yes. Actually some of the linguistic groups were among the very early
users of computing in the 60's and 70's. Building classical corpora. But
they never went outside a small group, for valid technical constraints -
punched cards.

Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20120322/1ec263fc/attachment.htm>

More information about the open-science mailing list