[open-science] [SCHOLCOMM] Libre open access, copyright, patent law, and, other intellectual property matters

john wilbanks jtw at del-fi.org
Thu Mar 22 15:35:53 GMT 2012

I realize that I didn't make my point clear enough actually.

And I don't lump Heather in with Harnard. Heather asked a good question 
that I answered obliquely. For that I apologize.

I do not just want the ability for academics to text mine. I want there 
to be a robust market for text mining that includes companies who mine 
open access content for their own reasons as well as academics, and I 
want there to be a robust market of startups who provide those text 
mining services (and thus must make and distribute copies of corpuses as 
validation sets, as part of collaborations with academics that improve 
algorithms, and who also produce and sell the outputs of text mining). 
Right now text mining pretty much sucks, frankly, compared to what it 
ought it be.

Non commercial licenses are not just a way to prevent other publishers 
from reselling content, which is often the focus of the conversation, 
but a tax on startups and companies who want to treat the literature as 
data. Here's a short list of companies trying to do just that who are 
being hamstrung by closed access, and who would be blocked under NC 
terms: Personalized Medicine (providing auto-annotation of genotypes to 
doctors' offices), Selventa (providing auto-created hypotheses 
explaining high throughput experimental biological data), Ingenuity 
(providing large databases of assertions specific to diseases or 
tissues). Those three are simply the first ones that jump to mind in 
startup land. There's ~20 more I know of, and many more that I don't.

The uncertainty around content chills venture investment, to boot. If 
the web had been NC licensed, we would not have google, and pagerank 
would have remained where it started as an academic theory experiment. 
That would suck, in my opinion. And the big publishers know this, which 
is precisely why they add clauses that ban mining to existing licenses 
and want commercial restrictions. I don't think they're worried about 
resale. I think they're worried about getting their lunch eaten by new 
entrants who see the market differently, as Apple did with music, as 
Google did to Microsoft (and in turn Facebook did to Google). That's why 
Elsevier has an entire unit devoted to this stuff, run by extremely 
smart people.

Then there's all of big pharma and biotech, who all maintain libraries 
and subscriptions, but are often absent from these discussions because 
of their position on patents.

Non commercial restrictions have *side effects* that are bad for 
innovation and bad for science. We need entrepreneurs and not just 

This is not nearly as much of a problem in the humanities on first 
blush, but the reality is that as text mining gets better, faster, 
cheaper, and more subtle in the hard sciences, it will bring amazing 
tools to the humanities as well.


On 3/22/2012 1:06 AM, Peter Murray-Rust wrote:
> On Wed, Mar 21, 2012 at 11:58 PM, john wilbanks <jtw at del-fi.org
> <mailto:jtw at del-fi.org>> wrote:
>     I'm going by the BBB declarations.
> Thanks John, [and Klaus] and so am I.
> I'm happy to see robust discussion on this list - we should avoid flame
> wars.
> It's somewhat unfortunate that there seems an operational division
> between science and humanities. It would be nice to have a one-size-fits
> all for "Open Access" but the reality may evolve to be different. The
> Harnad-Morrison-Thatcher approach could be summed up as:
> * the primary goal is that humans can somehow find a Gratis copy of the
> work to read with their eyes. It is of secondary importance whether the
> community has any rights.
> The science community on the other hand wishes to make complete use of
> the complete scholarly literature using modern technology to discover,
> index, extract, re-use, recompute, re-assemble in whatever way their
> imagination and technology runs to. (I wish to build an artificially
> intelligent chemical amanuensis by semantic analysis of the complete
> literature, for example).
> * ANY licence other than BBB-compliant prevents this ABSOLUTELY. Any
> publisher's contract prevents this absolutely.
> It is profoundly unhelpful to this cause to have people pontificating
> about absolute author's rights and quasi-religious approaches to solving
> the problem. Harnad and Morrison know nothing about high-throughput
> textmining, data extraction, eigenvector-based indexing, etc. If they
> wish to publish their own work under NC I shan't fight it.
> UK/PubMedcentral is crippled by the lack of explicit full-libre
> permission to re-use it. 20 million scientific articles of which about
> 1% are legally minable and those are extremely difficult to discover. I
> spent my "research" effort trying to find these, rather than actually
> DOING the science from them. Last week my tools read 500,000 chemical
> reactions from the patent literature, better as well as infinitely
> faster than any human on the planet. Those reactions can help to find
> new drugs, new ways of making drugs, new insights into chemistry.
> The reality is that science can operate extremely well with CC-BY. I am
> yet again preparing a clutch of articles for Biomed Central (a special
> issue with 17 APC-based articles). BMC have been running for 10 years.
> As far as I know there have been no serious misuse of the literature so
> there is no need to "protect" CC-BY.
> On a related point, institutional repositories are almost completely
> useless for modern literature analysis. They do not carry explicit
> machine-readable libre licences so we cannot by right use any of their
> content. They are fragmented - instead of the UK having ONE repository
> (say in the BL) which would be the rational thing that any scientist
> would do they are fragmented over 200 universities at great additional cost.
> Al that leads up to me thanking the RCUK for insisting on CC-BY and -
> with other scientific organizations such as Wellcome, and the Libre
> science publishers - making BBB-OpenAccess a reality. There is a great
> deal more to do, but at least we have a model that works and that
> politicians are listening to.
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069

john wilbanks

More information about the open-science mailing list