[open-science] Content Mining Workshop

Fri Nov 2 20:32:48 UTC 2012

I think this is a really good idea - I'd been thinking about suggesting it
to OKF. People getting into this area spend a lot of time hacking about
with substandard tools and ending up inefficient. I might be in UK end of
Feb onwards and would certainly want to help.

The vital need IMO is to list all the current tools and comment briefly on
them. There are several levels (off the top of my head):
* discovery - where is the content
* scraping - getting it onto your machine (or relevant cloud)
* extracting into "readable" form (e.g. PDF2Foo) - I am blogging about this
* finding tables
* searching for the bits you want (e.g. OSCAR for chemistry)
* linking to known ontologies (e.g. PubChem)
* fiinding somewhere to put the results

and, of course we need to touch on legal aspects and other procedural
matters.

I can contribute chemistry (which although it may not be immediately
relevant to everyone shows many of the aspects and most people know
slightly what it is about - they have heard of molecules and chemical
structures.

On Fri, Nov 2, 2012 at 4:05 PM, Jenny Molloy <jcmcoppice12 at gmail.com> wrote:

> Hi All
>
> I've had a request from a grad student via the Oxford Open Science group
> to run a hands-on workshop on data/content mining from the scientific
> literature, which sounds like an excellent idea.
>
> ...
>

-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20121102/cf4fb40f/attachment-0001.html>