[open-linguistics] Call for participation: Shared Task TIAD-2017 - Translation Inference Across Dictionaries

Mon Jan 23 11:32:55 UTC 2017

(apologies for cross-postings)

Call for participation: Shared Task TIAD-2017
Translation Inference Across Dictionaries
https://tiad2017.wordpress.com
Overview

Various methods and techniques have been explored in the past in the aim  
of automatically generating new bilingual (and multilingual) dictionaries  
 from existing ones, for instance using one (or more) language(s) as a  
pivot between two other source and target languages. However, such efforts  
were usually conducted on different types of datasets and evaluated in  
different ways, making it difficult to compare due to the different  
experimental setups and evaluation metrics.

TIAD-2017 is launched with the intention of offering quality lexical  
resources for a coherent experiment that enables reliable validation of  
results and solid comparison of methods and techniques used for the  
automatic generation of translations across languages. This initiative  
aims also to stimulate and enhance further research on the topic. It will  
make use of cross-lingual lexicographic data of K Dictionaries (KD), which  
will serve also to validate the results along with human assessment. The  
systems developed by participants and their results will be presented at a  
workshop that will be held as part of the first Language, Data and  
Knowledge conference in Galway, Ireland, on 18-20 June 2017  
(http://ldk2017.org). The papers describing the participant systems will  
be published on CEUR-WS (http://ceur-ws.org).
Task definition

The objective of the task is to indirectly generate translations for three  
language pairs, based on already known translations among eight languages  
in 14 bilingual dictionaries, involving four possible paths – all from  
German to Brazilian Portuguese – that feature between 1 to 4 pivot  
languages.

The test dataset consists of 100 randomly-selected German dictionary  
entries with their translations into a second language, and recursively  
exploring further translations in chained-up dictionaries – including up  
to 1,035 entries with 1,948 translation equivalents in the largest  
language pair that is provided. Besides the headwords and translations,  
the data includes information about parts of speech, subject domains and  
synonyms, as well as examples of usage and their translations.

The following language pairs are provided for the four paths:

(a)    German > English > Portuguese

(b)   German > Japanese > Spanish > Portuguese

(c)    German > Danish > French > Spanish > Portuguese

(d)   German > Dutch > Spanish > Danish > French > Portuguese

Also included are four Portuguese > German datasets, for closing the loop  
in each path, to help with the validation of the results.

The three new language pairs that should be generated are:

(1)    German > Portuguese

(2)    Danish > Spanish

(3)    Dutch > French

Evaluation of the results of each system will be carried out against KD’s  
manually compiled dictionaries for these pairs from the Global Series and  
other resources, as well as by human translators.

Participants can contribute on either or both of the following tracks:

(1)    Systems that use only the KD data released for the task

(2)    Systems that exploit, in addition to the KD data, other freely  
available sources of background knowledge (e.g., lexical linked open data  
and parallel corpora) to improve performance

Beyond performance, participants are encouraged to consider the following  
issues in particular:

·         The role of the language family with respect to the newly  
generated pairs

·         The asymmetry of pairs, and how translation direction affects  
the results

·         The behavior of different parts-of-speech among different  
languages
Important Dates

·         23.1.2017 – Call for participation / Test data released

·         15.4.2017 – Submission of results by participants

·         30.4.2017 – Evaluation of results communicated by organizers

·         01.6.2017 – Submission of system description papers

·         18.6.2017 – Workshop
Organizers

·         Jorge Gracia, Ontology Engineering Group, Universidad  
Politécnica de Madrid

·         Noam Ordan, K Dictionaries and The Arab Academic College of  
Education, Haifa

·         Ilan Kernerman, K Dictionaries, Tel Aviv
Review Committee

To be announced on February 1.
Terms and Website

A full description of TIAD-2017 and its binding terms and regulations is  
available on the website: https://tiad2017.wordpress.com/.
Contact

Noam Ordan: noam at kdictionaries.com
-- 
Prof. Dr. Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany

office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20170123/c9881f61/attachment-0002.html>