[ok-edinburgh] Open Knowledge meeting

Thu Feb 25 13:06:18 UTC 2010

dear Bonnie, thanks for this.

On 25/02/2010 13:20, Bonnie Webber wrote:
> Robin/Jo - Is this too far-fetched a link to the content
> of your Open Knowlege meeting?
>
> http://www.nature.com/news/2009/090824/full/news.2009.857.html
>
> The robots that people are assuming will automatically
> annotate and enrich documents created in Google Wave can
> only work if the databases and texts they need to crawl
> are themselves open.

Robin mentioned Peter Murray-Rust, and I was thinking of him too.

He did a talk at a workshop on Text Mining applications in Manchester 
last year on just this subject. A memorable line:
"My bots know a lot about chemistry, but nothing about copyright".

He challenged the speaker from Elsevier to commit to making currently 
"free" text that is not open for re-use, available for use by automated 
natural language processing tools, but the speaker could not commit.

Similar situation with OCLC, their terms of use on WorldCat expressly 
prohibit any automated crawling and parsing of the bibliographic 
metadata and fulltext papers, even for a pure research application.

There's been a lot of sponsorship of text mining / enhancement 
techniques in the physical sciences, particularly in chemistry and 
bioinformatics, where there's lots of consistent vocabulary, potential 
for serendipitous findings, generally low-hanging fruit.

The same techniques could work in law, archaeology, social sciences - 
there doesn't seem to be the same level of support, but researchers 
scratching their itches could prove viability for future investment - if 
the corpora they were working with had no re-use restrictions.

cheers,

jo
--