[open-linguistics] [Second Call for Participation] SemEval-2015 task 13: Multilingual All-Words Sense Disambiguation and Entity Linking

Tue Oct 28 08:48:08 UTC 2014

*Second Call for Participation*

*Multilingual All-Words Sense Disambiguation and Entity Linking*

SemEval-2015 task 13

http://alt.qcri.org/semeval2015/task13/

*Introduction*

The automatic understanding of the meaning of text has been a major goal of
research in computational linguistics and related areas for several
decades, with ambitious challenges, such as Machine Reading (Etzioni, 2006)
and the quest for knowledge (Schubert, 2006). Two key Natural Language
Processing tasks that need to be tackled as steps towards achieving the
goal of automatic understanding of text are Word Sense Disambiguation (WSD)
and Entity Linking (EL). WSD (Navigli, 2009) is a historical task aimed at
explicitly assigning meanings to single-word and multi-word occurrences
within text, a task which is today more alive than ever in the research
community. EL (Erbs et al., 2011; Cornolti et al., 2013; Rao et al., 2013)
is a more recent task which aims at discovering mentions of entities within
a text and linking them to the most suitable entry in a knowledge base. The
two main differences between WSD and EL lie in the kind of inventory used,
i.e., dictionary vs. encyclopedia, and the assumption that the mention is
complete or potentially partial, respectively. For instance, a named entity
such as “European Medicines Agency” may be referred to within a text as
simply “Medicines Agency”, the meaning of which, however, can be inferred
thanks to the context. Notwithstanding these differences, the tasks are
pretty similar in nature, in that they both involve the disambiguation of
textual fragments according to a reference inventory. However, the research
community has hitherto tended to tackle the two tasks separately, often
duplicating efforts and solutions.

In contrast to this trend, research in knowledge acquisition is heading
towards the seamless integration of encyclopedic and lexicographic
knowledge within structured language resources (Hovy et al., 2013), and the
main representative of this new direction is undoubtedly BabelNet
http://babelnet.org (Navigli and Ponzetto, 2012). Therefore these resources
seem to provide a common ground for the two tasks of WSD and EL. Only very
recently a joint approach, called Babelfy (http://babelfy.org), has been
proposed for both the tasks of WSD and EL (Moro et al., 2014).

*Task description*

In this task, our goal is to promote research in the direction of joint
word sense and named entity disambiguation, so as to focus research efforts
on the aspects that differentiate these two tasks without duplicating
research for common problems within the two tasks. However, we will also
allow systems that perform only one of the two tasks to participate, and
even systems which tackle one particular setting of WSD, such as all-words
sense disambiguation or on any subset of part-of-speech tags. Moreover,
given the recent upsurge of interest in multilingual approaches, we will
release our dataset in three different languages (English, Italian,
Spanish) on parallel corpora which will be independently and manually
annotated by different native/fluent speakers. In contrast to the
SemEval-2013 task 12, Multilingual Word Sense Disambiguation (Navigli et
al., 2013), our focus in this task is to present a dataset focused on both
kinds of inventories (i.e., named entities and word senses) in the specific
domain of biomedicine, in the attempt to further mitigate the distance
between research efforts regarding the dichotomy EL vs. WSD and those
regarding the dichotomy open domain vs. closed domain (i.e., biomedical
Information Extraction). For this reason we encourage submissions from all
these lines of research, in order that we can evaluate the distance between
approaches that exploit both kinds of knowledge (i.e., lexicographic and
encyclopedic) and approaches that work on both kinds of domain granularity
(i.e., open and closed).

*Word Senses and Named Entities inventory*

The evaluation will use BabelNet 2.5, available at http://babelnet.org/ which
contains WordNet, Wikipedia, Wiktionary, OmegaWiki, Wikidata and the Open
MultilingualWordNet.
*Important Dates*

   - Trial data ready: May 30, 2014
   - Training data ready: July 30, 2014 (there will be no training data)
   - Evaluation period starts: December 5, 2014
   - Evaluation period ends: December 20, 2014
   - Paper submission due: January 30, 2015
   - Paper reviews due: February 28, 2015
   - Camera ready due: March 30, 2015
   - SemEval workshop: Summer 2015

Organizers

   - Andrea Moro <http://wwwusers.di.uniroma1.it/~moro/>,* Sapienza
   University of Rome*;
   - Roberto Navigli <http://wwwusers.di.uniroma1.it/~navigli/>, *Sapienza
   University of Rome*.

*Google Group*

Please register to the following Google group:

"SemEval-2015 Task 13: Multilingual all-words WSD and EL
<https://groups.google.com/forum/?hl=en#%21forum/semeval-2015-task-13>"

*References*

Marco Cornolti, Paolo Ferragina, and Massimiliano Ciaramita. 2013. A
framework for benchmarking entity-annotation systems. In Proc. of WWW,
pages 249–260.

Nicolai Erbs, Torsten Zesch, and Iryna Gurevych. 2011. Link discovery: A
comprehensive analysis. In Proc. of ICSC, pages 83–86.

Oren Etzioni, Michele Banko, and Michael J Cafarella. 2006. Machine
Reading. In Proc. of AAAI, pages 1517–1519.

Eduard H. Hovy, Roberto Navigli, and Simone P. Ponzetto. 2013.
Collaboratively built semi-structured content and Artiﬁcial Intelligence:
The story so far. Artiﬁcial Intelligence, 194:2–27.

Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity
Linking meets
Word Sense Disambiguation: a Unified Approach. Transactions of the
Association for Computational Linguistics, 2, pages 231−244.

Roberto Navigli. 2009. Word Sense Disambiguation: A survey. ACM Computing
Surveys, 41(2):1–69.

Roberto Navigli, David Jurgens, and Daniele Vannella. 2013. SemEval-2013
Task 12: Multilingual Word Sense Disambiguation. In Proc. of SemEval-2013,
pages 222–231.

Roberto Navigli and Simone Paolo Ponzetto. 2012. BabelNet: The automatic
construction, evaluation and application of a wide-coverage
multilingual semantic
network. Artificial Intelligence, 193:217–250.

Delip Rao, Paul McNamee, and Mark Dredze. 2013. Entity Linking: Finding
Extracted Entities in a Knowledge Base. In Multi-source, Multilingual
Information
Extraction and Summarization, Theory and Applications of Natural Language
Processing, pages 93–115. Springer Berlin Heidelberg.

Lenhart K. Schubert. 2006. Turing’s dream and the knowledge challenge. In
Proc. of NCAI, pages 1534–1538.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20141028/8faf06c0/attachment-0002.html>