[okfn-help] Fwd: MARC Importer Improvements, Sample Output

William Waites william.waites at okfn.org
Wed Jun 23 16:41:33 BST 2010



-------- Original Message --------
Subject: MARC Importer Improvements, Sample Output
Date: Wed, 23 Jun 2010 16:15:08 +0100
From: William Waites <ww at styx.org>
Organisation: Idiosyntactix Research Laboratories
To: Graham Higgins <gjh at bel-epa.com>,  "'rufus.pollock at okfn.org'"
<rufus.pollock at okfn.org>, Jonathan Gray <jonathan.gray at okfn.org>,
sara.gray at okfn.org,  Ben O'Steen <bosteen at gmail.com>

So some vast improvements, and a step or two backwards.

On the bright side, the work in
http://knowledgeforge.net/pdw/trac/ticket/101
means we can now take MARC records and make sensible
RDF out of them -- probably the most complete such thing
out there if you look at the sample files (attached to this mail
and to the ticket).

In addition to the standard changeset layer, it includes

  * provenance information using OPMV, which includes the
     command line used to generate the data, should be
     sufficient for recreating it
  * proper handling of various types of identifiers -- this should
     be very helpful for deduplication. Wherever possible
     owl:sameAs links are made as well
  * proper handling of subject matter using controlled
     vocabularies where they are available (e.g. library of
     congress subject headings, etc) as well as proper representation
     of people when they are listed as a topic (e.g. biographies)

Much more to be done, the evolution of Work/Manifestation/
Item now has to be redone, and then a second pass at deduplication
(on the Work/Manifestation level) needs to be done. Fortunately
there should be provenance linkages to all source material so
you will be able to navigate down the whole tree to the individual
MARC record

Cheers,
-w

-- 
William Waites                       <ww at styx.org>
Mob: +44 789 798 9965
Fax: +44 131 464 4948
CD70 0498 8AE4 36EA 1CD7  281C 427A 3F36 2130 E9F5


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: example_marc.n3
URL: <http://lists.okfn.org/pipermail/okfn-help/attachments/20100623/085cea14/attachment.asc>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: example_marc_biography.n3
URL: <http://lists.okfn.org/pipermail/okfn-help/attachments/20100623/085cea14/attachment.txt>


More information about the okfn-help mailing list