[openbiblio-dev] BibJSON and multiple languages

Wed Mar 7 22:17:38 UTC 2012

On Wed, Mar 7, 2012 at 7:00 AM, Roderic Page <r.page at bio.gla.ac.uk> wrote:
> Apologies in advance if this has been covered before.
>
> The BibJSON spec, like most bibliographic data formats assumes that the
> publication is in one language. I often come across articles in multiple
> languages (e.g., the article, abstract, and body of the text is in
> Portuguese, but an English title and abstract are also provided). Note that
> this differs from a complete translation of the article (which can be
> treated as a separate thing).

Without seeing the publication, I'd say that this isn't a
multi-language article, but Portugese article with alternate titles
and abstracts.  An article could easily have multiple abstracts, even
in a single language, so the alternate language piece is just a slight
tweak on that.

> At the same time, I'd
> like to avoid having complicating things. For example, I don't want to have
> to specify the language if there's only one.

You need to specify the language somewhere, even if there's only one
(unless you want to go the colonial route and say it defaults to
English if not specified -- not a good idea, in my opinion).

On Wed, Mar 7, 2012 at 12:02 PM, Mark MacGillivray <mark at odaesa.com> wrote:
> On Wed, Mar 7, 2012 at 4:39 PM, Etienne Posthumus
> <etienne.posthumus at okfn.org> wrote:
>> An alternative would be to allow either simple strings OR objects.
>> (similar to what JSON-LD does in
>> http://json-ld.org/spec/latest/json-ld-syntax/#string-internationalization
>> as mentioned by Mark previously)
>>
>> This would push the complexity to the software in the parsing and
>> displaying, but allow the end-user to still use as simple as possible
>> bibjson.

I like the suggestion of JSON-LD's approach of using simple strings
for the language declared for the entry/document and language tagged
strings (in JSON dictionaries) for non-default language strings.

> Hmm... our backend (elasticsearch) is very flexible, but the one thing
> that cannot happen is that something previously defined as an object
> cannot then be supplied a string. We would have to make it so that
> everything in the backend was an object, but that we could accept
> strings in and turn them into objects, as well as converting simple
> objects into strings on the way out.
>
> The disadvantage of that is it would interfere with the display of
> objects on the UI - right now we have a lot of power from the UI side
> by being able to write queries to the backend with full lucene
> functionality if so desired - but this requires knowledge of the shape
> of things on the backend; not a big problem when we keep things nicely
> document oriented, and tell people what those documents look like.
> BUT, if we change what the documents look like in the backend, it will
> not be so easy to explain how to write queries to it.

That's a problem with tightly coupling the front end and back end.  It
makes it harder to change either one without having to change both.  I
think it'd be better to formally decouple them up front (even if the
current mapping is the identity mapping) to make it easier to
accommodate future growth and change.

Tom