[open-bibliography] Metadata aggregators, discovery tools and libraries

Peter Murray-Rust pm286 at cam.ac.uk
Sun Jan 23 01:05:58 UTC 2011


We need to understand more. They claim to have the following content:

Medical sciences123106

Biology52388

Engineering42237

Computers34224

Business & economics30769

Chemistry22486

Physics19390

Education16778

Mathematics14940

Environmental studies14521

History11755

Earth sciences10946

Agriculture8123

Pharmacy & pharmacology8088

Political science7991

Law7060

Public health & safety6109

Children & youth5566

Health facilities & administration5021

*Others136627*

*Total578125*
(Compare PMC with ca 20 million articles)



My immediate anaylssis is that they have collected links to the publishers
sites. They do not have bibliographic records per se. I think we will still
have to scrape pages.  Typical example:

ACS Chemical biology:
http://www.journaltocs.ac.uk/index.php?action=browse&subAction=subjects&publisherID=32&journalID=128&pageb=1&userQueryID=&sort=

   - *Identification of SR8278, a Synthetic Antagonist of the Nuclear Heme
   Receptor REV-ERB*<http://www.journaltocs.ac.uk/articleHomePage.php?id=2947548&userID=0>
      - *Authors:* *Douglas Kojetin; Yongjun Wang, Theodore M. Kamenecka
      Thomas P. Burris*
      *Abstract:*  [image, deleted by PMR]
      -
      - ACS Chemical Biology
      DOI : 10.1021/cb1002575
      *PubDate:* 2010-11-10T14:18:32Z
      [image: Export to
Refworks]<http://www.refworks.com/express/expressimport.asp?vendor=JournalTOCs&filter=Refworks%20Tagged%20Format&encoding=65001&url=http%3A//www.journaltocs.ac.uk/exports/refworks.php%3FitemID=2947548_0>


This is more-or-less the journal metadata. I am not sure what came in the
RSS feed nor am I sure whether HW has transformed any of it. The actual
journal has

Identification of SR8278, a Synthetic Antagonist of the Nuclear Heme
Receptor REV-ERB

Douglas Kojetin, Yongjun Wang, Theodore M. Kamenecka, and Thomas P.
Burris*<http://pubs.acs.org/doi/abs/10.1021/cb1002575#cor1>
The Scripps Research Institute, Jupiter, Florida 33458, United States
ACS Chem. Biol., Article ASAP
*DOI: *10.1021/cb1002575
Publication Date (Web): November 2, 2010
Copyright © 2010 American Chemical Society

Note the affiliation (not in JournalTOCs) - which is very important for us.

In both cases we have to have a scrape to extract author names - not sure
whether they are journal specific.

==========
To extract the JournalTOCs metadata we have to:
* iterate over subjects (ca 20)
* iterate over journals (which includes multiple pages)
* extract the data for the issue

I can't see any evidence of back issues. It seems that they only expose the
current issues from these publishers (maybe I'm missing something).

I am not sure whether they have taken the publishers RSS or whether they are
specifically given TOCs by the publishers.  If the former then presumably we
can also do that.

P.
-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-bibliography/attachments/20110123/979b194b/attachment-0001.html>


More information about the open-bibliography mailing list