[open-linguistics] Question: replacing language codes in a SPARQL BIND statement?
Felix Sasaki
fsasaki at w3.org
Mon Mar 14 21:37:50 UTC 2016
> Am 13.03.2016 um 12:09 schrieb Christian Chiarcos <chiarcos at informatik.uni-frankfurt.de>:
>
> Dear all,
>
> this is a general technical question, albeit one specific to working with multilinguality issues in multiple lemon/ontolex dictionaries, hence I'm asking here in the first place.
>
> Imagine the following situation: I use the Russian DBnary (provided in a slightly extended variant of the old lemon) and an ontolex dictionary for Chalkan (with Russian glosses). Both provided by third parties, and I do not want to manipulate the data prior to querying. Now, I want to use DBnary to retrieve an English gloss for the Chalkan words in a single SPARQL query.
>
> If both dictionaries use the same xml:lang representation, this works rather well (I skip the query for reasons of brevity): I bind the Russian gloss from the Chalkan dictionary to variable ?ru and start searching DBnary for a data property that assigns ?ru as literal.
>
> It is more complicated, though, if both files use different language codes, e.g., ISO-639-3 (rus) and ISO-639-2 (ru) for Russian, or if a language code with region sub-tag is used (e.g., ru-RU). Is there any way to use, say, BIND to bind the string value of ?ru to a new variable which uses ISO-639-2 codes instead of the original ISO-639-3 (resp. ISO-639-2+ISO-3166) code?
xml:lang allows only for BCP 47 language tags, and here the options you describe (e.g. ISO-639-3 vs. IS0-639-2) are not available. So if you use a language tag validator you can at least detect that an xml:lang value is not valid.
E.g. validate
<!DOCTYPE html>
<html lang="ru">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
<title>Test</title>
</head>
<body>
</body>
</html>
Via
https://validator.w3.org/#validate_by_input
now validate the same with
<html lang="rus">
and you get an error.
Of course in your workflow you don’t want to integrate the HTML validator as your language tag validator. But the underlying library
https://about.validator.nu/ <https://about.validator.nu/>
has a class to validate language tags on its own.
Best,
Felix
>
> At the moment, I see only one way to solve this problem, i.e., using FILTER, str() and a string comparison of both variables. This should be fairly inefficient, though, as I presume the FILTER is applied only after all potential bindings for both variables for Russian terms have been determined.
>
> Am I overlooking anything?
>
> Best,
> Christian
> --
> Prof. Dr. Christian Chiarcos
> Applied Computational Linguistics
> Johann Wolfgang Goethe Universität Frankfurt a. M.
> 60054 Frankfurt am Main, Germany
>
> office: Robert-Mayer-Str. 10, #401b
> mail: chiarcos at informatik.uni-frankfurt.de
> web: http://acoli.cs.uni-frankfurt.de
> tel: +49-(0)69-798-22463
> fax: +49-(0)69-798-28931
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20160314/d0cbd317/attachment-0003.html>
More information about the open-linguistics
mailing list