[open-development] Fwd: [NIPS workshop, call for papers] Cross-Lingual Technologies (xLiTe) - NIPS 2012 workshop

Mon Aug 20 14:48:56 UTC 2012

Hi everyone,

The below research theme looks like a really interesting avenue for us
to follow. I don't know how many conversations I've had about
technical standards not being the problem but rather language and
meaning surrounding what constitutes the labels that we apply to
metadata and so on and so forth.

But I would be interested in hearing your thoughts of whether
"automatic text understanding" is indeed an interesting venture, and
what will happen when the bulk of these methods will form from certain
databases just because they're available. Google News, for example, I
have no idea whether Google does an accurate job of representing local
communication patterns. Just makes me question the whole relevance of
this even if it would be extremely useful, to almost 'legitimize' what
many of us try to work towards.

Many thanks for your thoughts,

Caitlin

---------- Forwarded message ----------
From: Achim Rettinger <rettinger at kit.edu>
Date: Mon, Aug 20, 2012 at 10:28 AM
Subject: [NIPS workshop, call for papers] Cross-Lingual Technologies
(xLiTe) - NIPS 2012 workshop
To:

xLiTe: The workshop on 'Cross-Lingual Technologies' will be held in
conjunction with NIPS 2012. December 7, 2012. Lake Tahoe, Nevada, USA.

http://km.aifb.kit.edu/ws/xlite/

==================
Objectives
==================

Automatic text understanding has been an unsolved research problem for
many years. This partially results from the dynamic and diverging
nature of human languages, which ultimately results in many different
varieties of natural language. This variations range from the
individual level, to regional and social dialects, and up to seemingly
separate languages and language families.

However, in recent years there have been considerable achievements in
data driven approaches to computational linguistics exploiting the
redundancy in the encoded information and the structures used. Those
approaches are mostly not language specific or can even exploit
redundancies across languages.

This progress in cross-lingual technologies is largely due to the
increased availability of multilingual data in the form of static
repositories or streams of documents. In addition parallel and
comparable corpora like Wikipedia are easily available and constantly
updated. Finally, cross-lingual knowledge bases like DBpedia can be
used as an Interlingua to connect structured information across
languages. This helps at scaling the traditionally monolingual tasks,
such as information retrieval and intelligent information access, to
multilingual and cross-lingual applications.

>From the application side, there is a clear need for such
cross-lingual technology and services. Available systems on the market
are typically focused on multilingual tasks, such as machine
translation, and don't deal with cross-linguality. A good example is
one of the most popular news aggregators, namely Google News that
collects news isolated per individual language. The ability to cross
the border of a particular language would help many users to consume
the breadth of news reporting by joining information in their mother
tongue with information from the rest of the world.

==================
Important Dates
==================

‣ Early Submission: Sept 16, 2012
‣ Early Notification: Oct 7, 2012
‣ Late/Re- Submission: Oct 21, 2012
‣ Late Notification: Oct 28, 2012
‣ Workshop Day: Dec 7, 2012

==================
Call for Papers
==================

The workshop on cross-Lingual Technologies (xLiTe) offers a platform
for discussing algorithms and applications for statistical analysis of
language resources covering many languages.

The xLiTe workshop is aimed at techniques, which strive for
flexibility making them applicable across languages and language
varieties with less manual effort and manual labeled training data.
Such approaches might also be beneficial for solving the pressing task
of analyzing the continuously evolving natural language varieties that
are not well formed. Such data typically originates from social media,
like text messages, forum posts or tweets and often is highly domain
dependent.

Ideal contributions cover one or more of the topics listed below:
‣ Unsupervised and weakly supervised learning methods for
cross-lingual technologies
‣ Cross-lingual technologies beyond statistical machine translation
‣ Cross-lingual representations of linguistic structure

And cover cross-lingual tasks, such as:
‣ Information diffusion across the languages
‣ Cross-lingual document linking and comparison
‣ Cross-lingual topic modeling
‣ Cross-lingual information extraction
‣ Cross-lingual semantic distances
‣ Cross-lingual semantic parsing
‣ Cross-lingual disambiguation
‣ Cross-lingual semantic annotation
‣ Cross-lingual language resources and knowledge bases

For submission instructions see
http://km.aifb.kit.edu/ws/xlite/

==================
Confirmed Speakers
==================

‣ Ryan McDonald - Google Research
‣ Bill Dolan - Microsoft Research
‣ Evan Sandhaus - New York Times
‣ Ivan Titov - Saarland University

==================
Organizers
==================

‣ Achim Rettinger - Karlsruhe Institute of Technology
‣ Xavier Carreras - Technical University of Catalunya
‣ Marko Grobelnik - Jozef Stefan Institute
‣ Juanzi Li - Tsinghua University
‣ Blaz Fortuna - Jozef Stefan Institute

--
Achim Rettinger
Karlsruhe Institute of Technology (KIT)
www.aifb.kit.edu/web/Achim_Rettinger/en