[Open-access] [open-science-dev] Fwd: [open-science] fw: Python NLTK/data mining/machine learning project of public research data, anyone interested?

Tue Sep 25 17:53:40 UTC 2012

At OKFest we had a very successful hackathon looking at what we could
extract from bibliographic data. Michael Bauer (copied) trawled the
BioMedCentral site and has extracted a large amount of bibdata. We plan to
put this in Bibserver.

One idea that we want to do is create ids for each institution mentioned in
the author list, based on the text, e.g.

Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK

This would allow us to create facets for institutions, create a list and
browse using Bibserver. (Although we cannot formally uniquify, this is a
much easier problem than authors.

Laurent - I came across GROBID and am keen to re-use, rather than reinvent.

Perhaps we should form an informal group in this technology and coordinate
some of our efforts?

P.

-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-access/attachments/20120925/ac134812/attachment.html>