[openbiblio-dev] [Open-access] Trying to index the malaria literature for BOAI-Openness - what has to be done paper-by-paper?

Daniel Mietchen daniel.mietchen at googlemail.com
Wed Mar 28 08:28:20 UTC 2012

- The PMC OAI service and the PMC FTP service are the _only_ services
that may be used for automated downloading of articles from this open
access subset.
- Systematic retrieval (bulk downloading) of articles through any
other automated process is prohibited, even if you are only retrieving
articles from this subset.
- Some journals use the label "open access" for an article that is
available free at time of publication, but is still subject to
traditional copyright restrictions. Such articles are not part of this

We follow their download instructions, so we are on the safe side.


On Wed, Mar 28, 2012 at 9:29 AM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
> On Wed, Mar 28, 2012 at 5:31 AM, Nils Dagsson Moskopp
> <nils at dieweltistgarnichtso.net> wrote:
>> Daniel Mietchen <daniel.mietchen at googlemail.com> schrieb am Tue, 13 Mar
>> 2012 16:46:22 +0100:
>> > Google does not give a simple list of results, and it currently yields
>> > over 80k hits for malaria on PMC:
>> >
>> > https://www.google.com/search?q=malaria+site%3Awww.ncbi.nlm.nih.gov%2Fpmc .
>> >
>> > However, the Crawler currently being coded as part of the Open Access
>> > Media Importer (cf.
>> >
>> > http://wir.okfn.org/2012/03/10/open-access-media-importer-apology-frontend-usage/
>> > ) does almost what you are looking for, and so it should not be too
>> > difficult to modify it accordingly.
>> I just implemented something that might be of value for this purpose, a
>> command that outputs PMC metadata as CSV. Instructions for an sh-like
>> shell follow:
> This looks very useful. We are  thinking along the same lines on open-biblio
> so I have copied them.
>> git clone https://github.com/erlehmann/open-access-media-importer.git
>> cd open-access-media-importer
>> ./oa-get metadata pubmed
>> ./oa-cache list-articles pubmed | grep Malaria | grep creativecommons
>> Besides git and Python 2.6, you will need python-progressbar. For
>> operating systems, without sane package management, you can find it
>> here: <http://pypi.python.org/pypi/progressbar>.
>> Be aware that this downloads several GB of data from PubMed Central FTP
>> and may take some time. If you find any errors, let me know.
> Is this "metadata" or does it include abstracts? Because systematic
> downloading of abstracts is, I think , forbidden by the publishing
> community. We should stick to the material outlines in the Principles of
> Open Bibliography.
> P.
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069

More information about the openbiblio-dev mailing list