[open-bibliography] MARC and other standard formats -> RDF

Karen Coyle kcoyle at kcoyle.net
Wed Jun 16 23:55:42 UTC 2010


Quoting William Waites <william.waites at okfn.org>:



> This leads to an interesting question. I'm not
> aware of a field specifying the language of text
> *in the MARC record itself*.

No, there isn't a way to do that, nor to indicate the language of an  
individual field.

Most of the text looks
> neutral at first glance, but people have a habit
> of putting things like "[by] Foo Bar" or "electronic
> text" in some free-form text fields. Any heuristic
> for normalising this will be helped by knowledge of
> the language of the metadata as opposed to the book.

Language of the metadata gets tricky. There is a concept of "language  
of the catalog" that is used in the creation of catalog records. This  
is used for notes and for a few areas like "356 pages." It should be  
possible to indicate the language of the catalog when transforming  
data from a library catalog *before* it loses that context. That will  
NOT tell you the language of each field, and some fields (author name,  
book title) are very hard to characterize in terms of a language.

>
> Is there existing non-English language data in MARC
> format? If so, pointers would be appreciated. If not
> what source formats are we looking at for non-
> anglophone places? I know the Germans have something
> called MAB.

Try the Canadian libraries, who work in MARC in both English and French:

http://www.collectionscanada.gc.ca

>
> A lot of the practical work that is likely to follow
> onto the W3C LLD WG might involve libraries taking
> data in these source formats and transforming them
> to RDF. To what extent does the shape of the data
> in the source records inform the shape of the
> resulting triples? Is there anything we can learn from
> the (salient) differences between MARC, MAB and
> others?

Yes, I'm sure there is. Some of the differences will be because of  
different cataloging rules, others will be differences in how the data  
is encoded. Teasing those apart won't be easy, but in a sense it has  
begun as the German libraries attempt to move from MAB to MARC. They  
are asking for numerous changes in MARC so that their data will fit.


>
> Is it within the scope of the working group to
> enumerate these source data formats and provide
> recommended mappings to RDF?


More like: recommend that this task be undertaken. The LLD W3C group  
is only live for one year, but it is expected to "incubate" a number  
of follow-on tasks.

kc

>
> Cheers,
> -w
>
> --
> William Waites           <william.waites at okfn.org>
> Mob: +44 789 798 9965    Open Knowledge Foundation
> Fax: +44 131 464 4948                Edinburgh, UK
>



-- 
Karen Coyle
kcoyle at kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet





More information about the open-bibliography mailing list