[open-science-dev] [Open-access] [open-bibliography] Fwd: [open-science] fw: Python NLTK/data mining/machine learning project of public research data, anyone interested?

Laura Newman laura.newman at okfn.org
Wed Sep 26 08:12:38 UTC 2012


I spoke to Chris Taggart (Open Corporates) very briefly and in passing
about this at OKFest. My knowledge is imperfect, but as I understand it
part of Open Corporates is about identifying when two (different) entities
are the same - as well as trying to chase up the chain to identify if one
body actually owns another.

He was interested in what we were doing. Would it be worth someone talking
to him directly?




On Tue, Sep 25, 2012 at 7:16 PM, Mark MacGillivray <mark at cottagelabs.com>wrote:

> On Tue, Sep 25, 2012 at 6:53 PM, Peter Murray-Rust <pm286 at cam.ac.uk>wrote:
>
>> At OKFest we had a very successful hackathon looking at what we could
>> extract from bibliographic data. Michael Bauer (copied) trawled the
>> BioMedCentral site and has extracted a large amount of bibdata. We plan to
>> put this in Bibserver.
>>
>
> He has already done so, which brought up some issues I am resolving as
> part of a re-design anyway. Should be done next week.
>
>
>
>> One idea that we want to do is create ids for each institution mentioned
>> in the author list, based on the text, e.g.
>>
>> Unilever Centre, Dep. Of Chemistry
>> University of Cambridge
>> CB2 1EW, UK
>>
>> This would allow us to create facets for institutions, create a list and
>> browse using Bibserver. (Although we cannot formally uniquify, this is a
>> much easier problem than authors.
>>
>
> Actually we can facet without identifiers. It can be done directly on the
> name string. Identifiers just provide us a way to connect different names
> for the same thing, but the problem is still the same - we need to identify
> that two things are the same in the first place.
>
> Mark
>
>
>
>
>
>> Laurent - I came across GROBID and am keen to re-use, rather than
>> reinvent.
>>
>> Perhaps we should form an informal group in this technology and
>> coordinate some of our efforts?
>>
>> P.
>>
>>
>> --
>> Peter Murray-Rust
>> Reader in Molecular Informatics
>> Unilever Centre, Dep. Of Chemistry
>> University of Cambridge
>> CB2 1EW, UK
>> +44-1223-763069
>>
>> _______________________________________________
>> open-bibliography mailing list
>> open-bibliography at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-bibliography
>>
>>
>
> _______________________________________________
> open-access mailing list
> open-access at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-access
>
>


-- 
Laura Newman
Community Coordinator
Open Knowledge Foundation
http://okfn.org/
Skype: lauranewmanonskype
Twitter: @Newmanlk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science-dev/attachments/20120926/4e44f646/attachment.html>


More information about the open-science-dev mailing list