[openbiblio-dev] Multilingual matters in BibJSON

Tom Morris tfmorris at gmail.com
Thu Apr 5 14:05:09 UTC 2012

On Thu, Apr 5, 2012 at 8:39 AM, Christian Wittern <cwittern at gmail.com> wrote:
> On 2012-04-05 20:24, Karen Coyle wrote:
>> Multilingual issues and JSON-LD came up on the Open Biblio call this week
>> (which was Adrian, Jim and I). I'm interested in being able to use
>> multilingual thesauri, which are beginning to be created using linked data
>> by linking terms from thesauri in different languages, as well as thesauri
>> that were "born multilingual."
> Is this within the scope of BibJSON, or is this a different project?
>> Assigning languages to descriptive metadata *text* can be tricky. That
>> doesn't mean we shouldn't do it, but in some cases there may be best ways to
>> handle the issue. For example, I know that there is often an interest in
>> translated titles. You might have articles in Russian or Japanese that have
>> the title translated into English. In that case, it is probably best to have
>> a field for "translated title" so that you know that "title" is the
>> original. However, if you have a title that is:
>> "Marie Antoinette"
>> it's a bit hard to say what language it is in. Such a title would rarely
>> be translated, but there are examples in scientific literature where a
>> scientific term is used the same across different languages.
> Names might not have a language by itself, but the spelling could still be
> different, depending on the language. JSON-LD has the concept of a default
> language, that would probably be useful here.
> But I agree, having translated titles flagged separately might be useful.

When it is impossible to determine, text should be just left untagged
(or tagged is "unknown") rather than applying an incorrect guess, but
text almost always has a language or cultural context.  Even if the
Germans and Spaniards use the French name "Marie Antoinette"
(effectively making that her German/Spanish name too), the Koreans and
Japanese might have entirely different names (or at least
transliterated spellings).

For legacy data, I expect the bulk of the text will have no associated
language.  This will be difficult to deal with, but we can't go back
and change past cataloging practices.  The situation will improve over

Even worse, I've seen a lot of bibliographic data with both original
language and translated titles concatenated together (with a huge
variety of separators =,:=,==,/,...).  Fun times...

> Absolutely right. One has to think careful about this. FWIW, I have made up
> a record from my bibtex file {which uses private fields like Author_Ja etc.}
> to what I think would be a corresponding BibJSON record:
> {"type" : "book",
> "id" : "yanagida_shoki-zenshu_1967",
> "author" : [[{"@language" : "ja",
> "@value" : "柳田聖山"},
> {"@value" : "Yanagida, Seizan",
> "@language": "en"}]
> ],
> "booktitle" : [{"@language" : "ja", "@value" : "初期禅宗史書の研究" },
> {"@language" : "en", "@value" : "Studies in the Historical Works of the
> Early Period of Chán Buddhism"},
> {"@language" : "ja-Latn", "@value" : "Shoki zenshū shisho no kenkyū"}
> ],
> (Please bear in mind that I heard first about BibJSON only a few hours ago,
> so this might be fundamentally wrong).
> This has a mixture of original script, translation and transliteration, but
> does not employ a default language.
> This looks quite complicated to me, but I assume that this will usually be
> generated by a program, so I do not worry too much about that. The author
> field looks especially ugly with the nested list to allow for multiple
> authors, I wonder if this is a good approach?

I think that's close, but I'd prefer to see the author's name
explicitly specified rather than being an implicit property.  This
allows you to include other information (affiliation, IDs, birth year,

"author" : [
  {name: [
    {"@language" : "ja",
     "@value" : "柳田聖山"
    {"@value" : "Yanagida, Seizan",
    "@language": "en"


More information about the openbiblio-dev mailing list