[Open-access] "the publishing community" Re: Trying to index the malaria literature for BOAI-Openness - what has to be done paper-by-paper?

koltzenburg at w4w.net koltzenburg at w4w.net
Wed Mar 28 09:04:06 UTC 2012


Hi pmr,

> systematic
> downloading of abstracts is, I think , forbidden by the publishing
> community

well, actually publishers are birds of many feathers,
I would like to caution against othering "them" into one and the same (closed) pot :-)
better don't glue them together as birds of only one feather,

thinks
Claudia

On Wed, 28 Mar 2012 08:29:02 +0100, Peter Murray-Rust wrote
> On Wed, Mar 28, 2012 at 5:31 AM, Nils Dagsson Moskopp <
> nils at dieweltistgarnichtso.net> wrote:
> 
> > Daniel Mietchen <daniel.mietchen at googlemail.com> schrieb am Tue, 13 Mar
> > 2012 16:46:22 +0100:
> >
> > > Google does not give a simple list of results, and it currently yields
> > > over 80k hits for malaria on PMC:
> > >
> > https://www.google.com/search?q=malaria+site%3Awww.ncbi.nlm.nih.gov%2Fpmc.
> > >
> > > However, the Crawler currently being coded as part of the Open Access
> > > Media Importer (cf.
> > >
> > http://wir.okfn.org/2012/03/10/open-access-media-importer-apology-frontend-usage/
> > > ) does almost what you are looking for, and so it should not be too
> > > difficult to modify it accordingly.
> >
> > I just implemented something that might be of value for this purpose, a
> > command that outputs PMC metadata as CSV. Instructions for an sh-like
> > shell follow:
> >
> 
> This looks very useful. We are  thinking along the same lines on
> open-biblio so I have copied them.
> 
> > git clone https://github.com/erlehmann/open-access-media-importer.git
> > cd open-access-media-importer
> > ./oa-get metadata pubmed
> > ./oa-cache list-articles pubmed | grep Malaria | grep creativecommons
> >
> > Besides git and Python 2.6, you will need python-progressbar. For
> > operating systems, without sane package management, you can find it
> > here: <http://pypi.python.org/pypi/progressbar>.
> >
> > Be aware that this downloads several GB of data from PubMed Central FTP
> > and may take some time. If you find any errors, let me know.
> >
> > Is this "metadata" or does it include abstracts? Because systematic
> downloading of abstracts is, I think , forbidden by the publishing
> community. We should stick to the material outlines in the Principles of
> Open Bibliography.
> 
> P.
> 
> -- 
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069






More information about the open-access mailing list