[openbiblio-dev] What to do with the MARC data?

Sun Feb 20 01:41:13 UTC 2011

Dear all,

We're looking at migrating a bucket-load of MARC to RDF, probably using
the bibliographia stuff, and like the idea of putting it all into dc:,
skos:, etc, as they have done themselves. This is a given, however...

...I quite like my transforms to be lossless, and there seems to be an
opportunity to bundle all the MARC biblio stuff, alongside the more
familiar stuff. I imagine it's really irritating to find that for your
specialist application you're missing some subfield or indicator which has
been dropped or merged. And there seem to be few disadvantages to chucking
it alongside.

One option (which we may well take, in addition) is to bundle MARC records
themselves. However, it seems cruel to dump some poor guy who just needs
to know his parchment from his papyrus, her globe from her atlas, back
into the rickety world of MARC. It would also reduce arguments in
tea-breaks over mappings.

Transforms via MODS are natural, and give a low-resistance path, but still
isn't 1-to-1. We could hunt down some URL space for fields and subfields,
but there seem to be issues that order is important [100 a then d then a
then d, needs to be captured, ideally via intermediate nodes (a d),(a
d),...] and the use of indicators is screwily all over the place (some are
key-like, some value-like, some like nothing else in this world). The
great things about crosswalks is that there are so many to choose from. It
all starts to look like a nightmare.

So I was wondering, does anyone know of a good transform (ideally with
code) which maps MARC Biblio losslessly to RDF such that the sawn woman
can be reassembled, RDF -> MARC? (Not that this is necessary, but is
sufficient for all of the above).

Someone must have done a good data-model analysis of MARC Biblio from a
Comp Sci perspective regarding the above, and if they've operationalised
it, that would be brilliant, too.

If not, it looks like this marginal nice-to-have addition to the main
stream would be sufficiently doom-ridden that we'll have to leave it out.
The main advantage of it for me is the stopping the "dumbing down" debate
over the crosswalking, so doesn't need end-user analysis so much as %age
of chatter analysis!

Dan.