[open-bibliography] Metadata aggregators, discovery tools and libraries
Peter Murray-Rust
pm286 at cam.ac.uk
Sun Jan 23 01:05:58 UTC 2011
We need to understand more. They claim to have the following content:
Medical sciences123106
Biology52388
Engineering42237
Computers34224
Business & economics30769
Chemistry22486
Physics19390
Education16778
Mathematics14940
Environmental studies14521
History11755
Earth sciences10946
Agriculture8123
Pharmacy & pharmacology8088
Political science7991
Law7060
Public health & safety6109
Children & youth5566
Health facilities & administration5021
*Others136627*
*Total578125*
(Compare PMC with ca 20 million articles)
My immediate anaylssis is that they have collected links to the publishers
sites. They do not have bibliographic records per se. I think we will still
have to scrape pages. Typical example:
ACS Chemical biology:
http://www.journaltocs.ac.uk/index.php?action=browse&subAction=subjects&publisherID=32&journalID=128&pageb=1&userQueryID=&sort=
- *Identification of SR8278, a Synthetic Antagonist of the Nuclear Heme
Receptor REV-ERB*<http://www.journaltocs.ac.uk/articleHomePage.php?id=2947548&userID=0>
- *Authors:* *Douglas Kojetin; Yongjun Wang, Theodore M. Kamenecka
Thomas P. Burris*
*Abstract:* [image, deleted by PMR]
-
- ACS Chemical Biology
DOI : 10.1021/cb1002575
*PubDate:* 2010-11-10T14:18:32Z
[image: Export to
Refworks]<http://www.refworks.com/express/expressimport.asp?vendor=JournalTOCs&filter=Refworks%20Tagged%20Format&encoding=65001&url=http%3A//www.journaltocs.ac.uk/exports/refworks.php%3FitemID=2947548_0>
This is more-or-less the journal metadata. I am not sure what came in the
RSS feed nor am I sure whether HW has transformed any of it. The actual
journal has
Identification of SR8278, a Synthetic Antagonist of the Nuclear Heme
Receptor REV-ERB
Douglas Kojetin, Yongjun Wang, Theodore M. Kamenecka, and Thomas P.
Burris*<http://pubs.acs.org/doi/abs/10.1021/cb1002575#cor1>
The Scripps Research Institute, Jupiter, Florida 33458, United States
ACS Chem. Biol., Article ASAP
*DOI: *10.1021/cb1002575
Publication Date (Web): November 2, 2010
Copyright © 2010 American Chemical Society
Note the affiliation (not in JournalTOCs) - which is very important for us.
In both cases we have to have a scrape to extract author names - not sure
whether they are journal specific.
==========
To extract the JournalTOCs metadata we have to:
* iterate over subjects (ca 20)
* iterate over journals (which includes multiple pages)
* extract the data for the issue
I can't see any evidence of back issues. It seems that they only expose the
current issues from these publishers (maybe I'm missing something).
I am not sure whether they have taken the publishers RSS or whether they are
specifically given TOCs by the publishers. If the former then presumably we
can also do that.
P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-bibliography/attachments/20110123/979b194b/attachment-0001.html>
More information about the open-bibliography
mailing list