[openbiblio-dev] What to do with the MARC data?

Ben O'Steen bosteen at gmail.com
Sun Feb 20 10:51:37 UTC 2011


I think that one of the crucial steps is to understand *when*
information loss has actually happened. You may find that even if you
have the complete record, the loss has occurred even before you
attempt the translation.

There may be a lot of implicit information, that seemed obvious to a
cataloguer ten years ago and was omitted, but which needs to be in the
record now to make sense away from the context of its original
library.

There is also the question of convention. Almost every library I have
worked with had a selection of extensions and local, unique
conventions within their MARC, which was due to historical, logistical
or even unknown needs, extensions that caused the second subfield to
mean something different, or that the real shelfmark was held in a xx9
field for example.

In summary,  I'd suggest aiming for a decent translation where the
important information is carried across, and then focus on repairing
and interpreting that data, making explicit all the implicit,
convention-obscured data. You may find the data you have to be pretty
lossy already!

Ben

On 20 February 2011 01:41, Dan Sheppard <dan.sheppard at caret.cam.ac.uk> wrote:
> Dear all,
>
> We're looking at migrating a bucket-load of MARC to RDF, probably using
> the bibliographia stuff, and like the idea of putting it all into dc:,
> skos:, etc, as they have done themselves. This is a given, however...
>
> ...I quite like my transforms to be lossless, and there seems to be an
> opportunity to bundle all the MARC biblio stuff, alongside the more
> familiar stuff. I imagine it's really irritating to find that for your
> specialist application you're missing some subfield or indicator which has
> been dropped or merged. And there seem to be few disadvantages to chucking
> it alongside.
>
> One option (which we may well take, in addition) is to bundle MARC records
> themselves. However, it seems cruel to dump some poor guy who just needs
> to know his parchment from his papyrus, her globe from her atlas, back
> into the rickety world of MARC. It would also reduce arguments in
> tea-breaks over mappings.
>
> Transforms via MODS are natural, and give a low-resistance path, but still
> isn't 1-to-1. We could hunt down some URL space for fields and subfields,
> but there seem to be issues that order is important [100 a then d then a
> then d, needs to be captured, ideally via intermediate nodes (a d),(a
> d),...] and the use of indicators is screwily all over the place (some are
> key-like, some value-like, some like nothing else in this world). The
> great things about crosswalks is that there are so many to choose from. It
> all starts to look like a nightmare.
>
> So I was wondering, does anyone know of a good transform (ideally with
> code) which maps MARC Biblio losslessly to RDF such that the sawn woman
> can be reassembled, RDF -> MARC? (Not that this is necessary, but is
> sufficient for all of the above).
>
> Someone must have done a good data-model analysis of MARC Biblio from a
> Comp Sci perspective regarding the above, and if they've operationalised
> it, that would be brilliant, too.
>
> If not, it looks like this marginal nice-to-have addition to the main
> stream would be sufficiently doom-ridden that we'll have to leave it out.
> The main advantage of it for me is the stopping the "dumbing down" debate
> over the crosswalking, so doesn't need end-user analysis so much as %age
> of chatter analysis!
>
> Dan.
>
>
>
> _______________________________________________
> openbiblio-dev mailing list
> openbiblio-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openbiblio-dev
>




More information about the openbiblio-dev mailing list