[Open-access] [open-science-dev] Fwd: [open-science] fw: Python NLTK/data mining/machine learning project of public research data, anyone interested?

Wed Sep 26 03:54:01 UTC 2012

I think it would be a very good idea to do so. 
Cheers,
Laurent (disappearing 2 days in a review panel on research infrastructures.... I'll keep an eye on data openness)

BTW, we generate TEI encoded data from PDFs, with deep name and affiliation (TEI based) structures which could be taken , e.g. 

				<biblStruct>
					<analytic>
						<author>
							<persName>
								<forename type="first">K</forename>
								<forename type="middle">E</forename>
								<surname>Zimen</surname>
							</persName>
							<affiliation>
								<orgName type="department">Institut für Kernchemie</orgName>
								<orgName type="institution">Chalmers Technische Hochschule</orgName>
								<address>
									<settlement>Göteborg</settlement>
									<country key="SE">Schweden</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName>
								<forename type="first">L</forename>
								<surname>Dahl</surname>
							</persName>
							<affiliation>
								<orgName type="department">Institut für Kernchemie</orgName>
								<orgName type="institution">Chalmers Technische Hochschule</orgName>
								<address>
									<settlement>Göteborg</settlement>
									<country key="SE">Schweden</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Die Diffusion von Spaltungs-Xenon aus Uranmetall</title>
					</analytic>
					<monogr>
						<title level="j" type="main">Zeitschrift für Naturforschung A</title>
						<title level="j" type="abbrev">Z. Naturforsch. A</title>
						<imprint>
							<biblScope type="vol">12</biblScope>
							<biblScope type="fpage">167</biblScope>
							<biblScope type="lpage">169</biblScope>
							<date type="published" when="1957">1957</date>
						</imprint>
					</monogr>
					<note type="submission">eingegangen am 1. Februar 1957</note>
				</biblStruct>

Le 25 sept. 2012 à 19:53, Peter Murray-Rust a écrit :

> At OKFest we had a very successful hackathon looking at what we could extract from bibliographic data. Michael Bauer (copied) trawled the BioMedCentral site and has extracted a large amount of bibdata. We plan to put this in Bibserver.
> 
> One idea that we want to do is create ids for each institution mentioned in the author list, based on the text, e.g. 
> 
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> 
> This would allow us to create facets for institutions, create a list and browse using Bibserver. (Although we cannot formally uniquify, this is a much easier problem than authors.
> 
> Laurent - I came across GROBID and am keen to re-use, rather than reinvent. 
> 
> Perhaps we should form an informal group in this technology and coordinate some of our efforts? 
> 
> P.
> 
> 
> -- 
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069

Laurent Romary
INRIA & HUB-IDSL
laurent.romary at inria.fr

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-access/attachments/20120926/ebe6895d/attachment.html>