[openbiblio-dev] Medline and UKPMC bibliography
Peter Murray-Rust
pm286 at cam.ac.uk
Thu Feb 2 19:37:36 UTC 2012
I have re-uploaded
The pubmed "RIS" is very seriously broken WRT to the Thomson Reuters "RIS"
spec. I am assuming that pubmed has guessed and got it wrong. Here are some
of the Pubmed problems:
* entries should start with TY, PMs don't
* entries should end with ER, PMs don't
* lines should start should be [A-Z][A-Z0-9]\s\s\-\s, PM can use three
characters or 4
*PM makes up lots of their own tags. Not sure whether this is allowed
There is a different problem with whitespace. In PM we have:
DP - 2009 Sep-Oct
TI - [Red-breasted goose colonies on the Taimyr Peninsula: factors
responsible for the
proximity of goose nests to nests of peregrine falcons, rough-legged
buzzards,
and snowy owls].
PG - 559-68
The current parser skips the two whitespace-prefixed lines whereas it
should concatenate them to the preceding.
This is a typical mess - we have this all the time in chemistry. People
"improve" file formats and break software. My guess is that we should have
a separate parse for MEDLINE. Else we have to put lots of if/thens into the
RIS parser.
P
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openbiblio-dev/attachments/20120202/dd3a6e0e/attachment.html>
More information about the openbiblio-dev
mailing list