[Open-access] [open-bibliography] [open-science-dev] Fwd: [open-science] fw: Python NLTK/data mining/machine learning project of public research data, anyone interested?
Mark MacGillivray
mark at cottagelabs.com
Tue Sep 25 18:16:00 UTC 2012
On Tue, Sep 25, 2012 at 6:53 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
> At OKFest we had a very successful hackathon looking at what we could
> extract from bibliographic data. Michael Bauer (copied) trawled the
> BioMedCentral site and has extracted a large amount of bibdata. We plan to
> put this in Bibserver.
>
He has already done so, which brought up some issues I am resolving as part
of a re-design anyway. Should be done next week.
> One idea that we want to do is create ids for each institution mentioned
> in the author list, based on the text, e.g.
>
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
>
> This would allow us to create facets for institutions, create a list and
> browse using Bibserver. (Although we cannot formally uniquify, this is a
> much easier problem than authors.
>
Actually we can facet without identifiers. It can be done directly on the
name string. Identifiers just provide us a way to connect different names
for the same thing, but the problem is still the same - we need to identify
that two things are the same in the first place.
Mark
> Laurent - I came across GROBID and am keen to re-use, rather than
> reinvent.
>
> Perhaps we should form an informal group in this technology and coordinate
> some of our efforts?
>
> P.
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>
> _______________________________________________
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-bibliography
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-access/attachments/20120925/18ae6aed/attachment.html>
More information about the open-access
mailing list