[open-bibliography] OCLC adds Linked Data to WorldCat.org | DDC 23 released as linked data at dewey.info

Sun Jun 24 14:47:03 UTC 2012

Thanks Karen,

On Sun, Jun 24, 2012 at 3:01 PM, Karen Coyle <kcoyle at kcoyle.net> wrote:

> Peter, I assume that the "technical difficulty" is the size of the
> database -- something on the order of 180 million bibliographic records.
> ... Ah, no, I just looked it up in their annual report:
> 235,822,950 records, of which 189,421,960 are for books. And that's from a
> year ago.
>
> It would be very interesting to be able to slice-n-dice that database,

Indeed

> which, perhaps except for the secret database that Google holds, is the
> largest bibliographic database in the world. For example, their records for
> some less-populous languages may constitute a more complete bibliography of
> the writings of that culture than even the relevant national library has.
>
> I plan to write to some of the OCLC folks

Excellent

> and ask if they will be setting up the needed index to encourage the
> re-spidering of this data by the major search engines.

And I hope also by US (the Open bibliographic community)!

> If so, I wonder if those search engines then can become an entry point to
> the data. I also have not seen mention of a SPARQL end point for searching
> the database, but I probably haven't encountered all of the technical
> information.
>

200 million complete records (whatever a complete record is - and that's
part of the problem) in RDF will slaughter most systems (we found that 20
million from Pubmed was a problem). But 20 million in BibJSON with the most
important fields is quite possible. I am not sure  about 200 million - but
it could be possible. Mark McG may comment.

I might upset some purists but I suspect that a collection with the main
details (as we have listed in the Open Bibliographic Principles - author,
data, publisher, title) coukld be extremely valuable for a very wide range
of people who currently don't use bibliography at all. It might not be full
enough for library management, but it would do very well for
undergraduates, hobbyists, citation lists in articles, and Wikipedia, etc.

If they expose an API then it should be possible to distribute the load of
extracting the metadata.

Of course there is always a problem in making sure this info is up-to-date
but this is an excellent place to start deploying such systems.

On the assumption that OCLC want to world to have the totality of their
data and are looking for suggestions as to how to make that happen, I thank
them.

-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-bibliography/attachments/20120624/0dd76703/attachment-0001.html>