[open-science-dev] [Open-access] [open-bibliography] Fwd: [open-science] fw: Python NLTK/data mining/machine learning project of public research data, anyone interested?

Wed Sep 26 08:18:38 UTC 2012

Hi,

I have to add to this that the dataset uploaded does not yet contain all
the affiliations. I don't planned to do this here but didn't have the time
yet to get my system going. I'd probably need to upload more information if
more author affiliations come in - don't know how to update the existing
entries.

One problem is that the affiliations are a chunk of large text and nothing
very well defined - this is what we wanted to use pybossa for.

However my time working on this is very limited especially with my School
of Data trip to Africa next month.

Michael

On Wed, Sep 26, 2012 at 09:12:38AM +0100, Laura Newman wrote:
> I spoke to Chris Taggart (Open Corporates) very briefly and in passing
> about this at OKFest. My knowledge is imperfect, but as I understand it
> part of Open Corporates is about identifying when two (different) entities
> are the same - as well as trying to chase up the chain to identify if one
> body actually owns another.
> 
> He was interested in what we were doing. Would it be worth someone talking
> to him directly?
> 
> 
> 
> 
> On Tue, Sep 25, 2012 at 7:16 PM, Mark MacGillivray <mark at cottagelabs.com>wrote:
> 
> > On Tue, Sep 25, 2012 at 6:53 PM, Peter Murray-Rust <pm286 at cam.ac.uk>wrote:
> >
> >> At OKFest we had a very successful hackathon looking at what we could
> >> extract from bibliographic data. Michael Bauer (copied) trawled the
> >> BioMedCentral site and has extracted a large amount of bibdata. We plan to
> >> put this in Bibserver.
> >>
> >
> > He has already done so, which brought up some issues I am resolving as
> > part of a re-design anyway. Should be done next week.
> >
> >
> >
> >> One idea that we want to do is create ids for each institution mentioned
> >> in the author list, based on the text, e.g.
> >>
> >> Unilever Centre, Dep. Of Chemistry
> >> University of Cambridge
> >> CB2 1EW, UK
> >>
> >> This would allow us to create facets for institutions, create a list and
> >> browse using Bibserver. (Although we cannot formally uniquify, this is a
> >> much easier problem than authors.
> >>
> >
> > Actually we can facet without identifiers. It can be done directly on the
> > name string. Identifiers just provide us a way to connect different names
> > for the same thing, but the problem is still the same - we need to identify
> > that two things are the same in the first place.
> >
> > Mark
> >
> >
> >
> >
> >
> >> Laurent - I came across GROBID and am keen to re-use, rather than
> >> reinvent.
> >>
> >> Perhaps we should form an informal group in this technology and
> >> coordinate some of our efforts?
> >>
> >> P.
> >>
> >>
> >> --
> >> Peter Murray-Rust
> >> Reader in Molecular Informatics
> >> Unilever Centre, Dep. Of Chemistry
> >> University of Cambridge
> >> CB2 1EW, UK
> >> +44-1223-763069
> >>
> >> _______________________________________________
> >> open-bibliography mailing list
> >> open-bibliography at lists.okfn.org
> >> http://lists.okfn.org/mailman/listinfo/open-bibliography
> >>
> >>
> >
> > _______________________________________________
> > open-access mailing list
> > open-access at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/open-access
> >
> >
> 
> 
> -- 
> Laura Newman
> Community Coordinator
> Open Knowledge Foundation
> http://okfn.org/
> Skype: lauranewmanonskype
> Twitter: @Newmanlk

-- 
Data Wrangler with the Open Knowledge Foundation (OKFN.org)
GPG/PGP key: http://tentacleriot.eu/mihi.asc
Twitter: @mihi_tr Skype: mihi_tr