[open-bibliography] [Open-access] [open-science-dev] Fwd: [open-science] fw: Python NLTK/data mining/machine learning project of public research data, anyone interested?

Peter Murray-Rust pm286 at cam.ac.uk
Mon Aug 20 13:19:05 UTC 2012

On Mon, Aug 20, 2012 at 2:04 PM, Laurent Romary <laurent.romary at inria.fr>wrote:

> It depends what you mean by crawler. Can you say more about this?
> Recursively crawls publisher->journal->issue->article
> This we do not have.
Great!! then we interface directly

>> In PEER, I was the one to develop the XSLT stylesheets from the various
>> publishers' formats (ScholarOne, various versions of NLM, Elsevier, Nature,
>> ...) to TEI. I have never managed to put this together in SF, but could zip
>> this to however would want to push things further.
>> This assumes that one has XML. I am working on the assumption that we
> have the PDF only (and that's an advantage for getting the material out of
> diagrams)
> We actually worked on both scenario in PEER. So the software on SF work
> directly with PDFs and the stylesheets are there because we also got a huge
> amount of data directly from publishers.

Problem with material from publishers is that it is usually a one-off
provision of material and there are often legal constraints


> Laurent
> P.
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
> Laurent Romary
> laurent.romary at inria.fr

Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-bibliography/attachments/20120820/8a58c75c/attachment-0001.html>

More information about the open-bibliography mailing list