[openbiblio-dev] [Open-access] Trying to index the malaria literature for BOAI-Openness - what has to be done paper-by-paper?

Peter Murray-Rust pm286 at cam.ac.uk
Wed Mar 28 07:29:02 UTC 2012


On Wed, Mar 28, 2012 at 5:31 AM, Nils Dagsson Moskopp <
nils at dieweltistgarnichtso.net> wrote:

> Daniel Mietchen <daniel.mietchen at googlemail.com> schrieb am Tue, 13 Mar
> 2012 16:46:22 +0100:
>
> > Google does not give a simple list of results, and it currently yields
> > over 80k hits for malaria on PMC:
> >
> https://www.google.com/search?q=malaria+site%3Awww.ncbi.nlm.nih.gov%2Fpmc.
> >
> > However, the Crawler currently being coded as part of the Open Access
> > Media Importer (cf.
> >
> http://wir.okfn.org/2012/03/10/open-access-media-importer-apology-frontend-usage/
> > ) does almost what you are looking for, and so it should not be too
> > difficult to modify it accordingly.
>
> I just implemented something that might be of value for this purpose, a
> command that outputs PMC metadata as CSV. Instructions for an sh-like
> shell follow:
>

This looks very useful. We are  thinking along the same lines on
open-biblio so I have copied them.


> git clone https://github.com/erlehmann/open-access-media-importer.git
> cd open-access-media-importer
> ./oa-get metadata pubmed
> ./oa-cache list-articles pubmed | grep Malaria | grep creativecommons
>
> Besides git and Python 2.6, you will need python-progressbar. For
> operating systems, without sane package management, you can find it
> here: <http://pypi.python.org/pypi/progressbar>.
>
> Be aware that this downloads several GB of data from PubMed Central FTP
> and may take some time. If you find any errors, let me know.
>
> Is this "metadata" or does it include abstracts? Because systematic
downloading of abstracts is, I think , forbidden by the publishing
community. We should stick to the material outlines in the Principles of
Open Bibliography.

P.


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openbiblio-dev/attachments/20120328/afdd6bd4/attachment.html>


More information about the openbiblio-dev mailing list