[open-linguistics] Second Call for Papers - NLP & DBpedia 2014 @ISWC2014

Tue Jun 10 21:04:04 UTC 2014

NLP & DBpedia 2014 - Second Call for Papers

2nd International Workshop on NLP & DBpedia 2014

19 or 20 October, 2014
Riva del Garda, Italy
Collocated with the 13th International Semantic Web Conference (ISWC2014).

Submission Deadline: 7 July 2014
Notification of Acceptance: 30 July 2014

Workshop URI: http://nlp-dbpedia2014.blogs.aksw.org/
Submissions via: https://www.easychair.org/conferences/?conf=nlpdbpedia2014
Hashtag: #NLPDBP2014
Contact: nlpdbpedia2014 at easychair.org

Motivation
The DBpedia community has recently experienced an immense increase in 
activity. We believe that the time has come to explore the connection 
between DBpedia & Natural Language Processing (NLP) in a yet 
unprecedented depth.

DBpedia has a long-standing tradition to provide useful data as well as 
a commitment to reliable Semantic Web technologies and living best 
practices. With the rise of WikiData, DBpedia is step-by-step relieved 
from the tedious extraction of data from Wikipedia’s infoboxes and can 
shift its focus on new challenges such as extracting information from 
the unstructured article text as well as becoming a testing ground for 
multilingual NLP methods.

The central role of Wikipedia (and therefore DBpedia) for the creation 
of a Translingual Web has recently been recognized by the Strategic 
Research Agenda 
(http://www.meta-net.eu/vision/reports/meta-net-sra-version_1.0.pdf cf. 
section 3.4, page 23) and most of the contributions of the recent 
Dagstuhl seminar on the Multilingual Semantic Web ( 
http://www.dagstuhl.de/de/programm/kalender/semhp/?semnr=12362) also 
stress the role of Wikipedia for Multilingualism 
(http://drops.dagstuhl.de/opus/volltexte/2013/3788/pdf/dagrep_v002_i009_p015_s12362.pdf). 
As more and more language-specific chapters of DBpedia are created 
(currently 14 language editions), DBpedia is becoming a driving factor 
for a Linguistic Linked Open Data cloud 
(http://linguistics.okfn.org/resources/llod/) as well as localized LOD 
clouds with specialized domains (e.g. the Dutch windmill domain ontology 
created from http://nl.dbpedia.org).

The data contained in Wikipedia and DBpedia have ideal properties for 
making them a controlled testbed for NLP. Wikipedia and DBpedia are 
multilingual and multi-domain, the communities maintaining these 
resource are very open and it is easy to join and contribute. The open 
licence allows data consumers to benefit from the content and many parts 
are collaboratively editable. Especially, the data in DBpedia is widely 
used and disseminated throughout the Semantic Web.

We envision the workshop to produce the following items:
• an open call to the DBpedia data consumer community will generate a 
wish list of data, which is to be generated from Wikipedia by NLP 
methods. This wish list will be broken down to tasks and benchmarks, and 
a gold standard will be created.
• the benchmarks and test data created will be collected and published 
under an open licence for future evaluation (inspired by 
http://oaei.ontologymatching.org/ and 
http://archive.ics.uci.edu/ml/datasets.html).

NLP4DBpedia
DBpedia has been around for quite a while, infusing the Web of Data with 
multi-domain data of decent quality. The data in DBpedia is, however, 
mostly extracted from Wikipedia infoboxes, while the remaining parts of 
Wikipedia are to a large extent not exploited for DBpedia. Here, NLP 
techniques may help improving DBpedia.

Extracting additional triples from the plain text information in 
Wikipedia, either unsupervised or using the existing triples as training 
information, could multiply the information in DBpedia, or help telling 
correct from incorrect information by finding supporting text passages. 
Furthermore, analyzing the semantics of other structures in Wikipedia, 
such as tables, list pages, or categories, would help make DBpedia 
richer. Finally, since Wikipedia exists in more than 200 languages, we 
are particularly interested in seeing NLP approaches not only working 
for English, but also for other languages, in order to leverage the huge 
amount of knowledge captured in the different language editions.

DBpedia4NLP
On the other hand, NLP and information extraction techniques often 
involve various resources while processing texts from different domains. 
As high-quality annotated data is often too expensive and time-consuming 
to obtain, NLP researchers are looking to external structured sources to 
complement their datasets. Such resources can be gazetteers to aid a 
named entity recognition system or examples of relations between 
entities to bootstrap a relation finder. DBpedia can easily be utilised 
to assist NLP modules in a variety of tasks.

We invite papers from both these areas including:
• Knowledge extraction from text and HTML documents (especially 
unstructured and semi-structured documents) on the Web, using 
information in the Linked Open Data (LOD) cloud, and especially in DBpedia.
• Representation of NLP tool output and NLP resources as RDF/OWL, and 
linking the extracted output to the LOD cloud.
• Novel applications using the extracted knowledge, the Web of Data or 
NLP DBpedia-based methods.

Topics include, but are not limited to

• Improving DBpedia with NLP methods
• Finding errors in DBpedia with NLP methods
• Annotation methods for Wikipedia articles
• Cross-lingual data and text mining on Wikipedia
• Pattern and semantic analysis of natural language, reading the Web, 
learning by reading
• Large-scale information extraction
• Entity resolution and automatic discovery of Named Entities
• Multilingual entity recognition task of real world entities
• Frequent pattern analysis of entities
• Relationship extraction, slot filling
• Entity linking, Named Entity disambiguation, cross-document 
co-reference resolution
• Disambiguation through knowledge base
• Ontology representation of natural language text
• Analysis of ontology models for natural language text
• Learning and refinement of ontologies
• Natural language taxonomies modeled to Semantic Web ontologies
• Use cases of entity recognition for Linked Data applications
• Impact of entity linking on information retrieval, semantic search

Furthermore, an informal list of NLP tasks can be found on this 
Wikipedia page: 
http://en.wikipedia.org/wiki/Natural_language_processing#Major_tasks_in_NLP
These are relevant for the workshop as long as they fit into the 
DBpedia4NLP and NLP4DBpedia frame (i.e. the used data evolves around 
Wikipedia and DBpedia).
Workshop format
The workshop will be pro-active to encourage collaborative 
participation: for example, live minutes of the workshop will be taken 
using an open EtherPad. We plan to collect the material used by each 
submission such as dataset used, source code, etc. and to share it to 
the whole community using a portal such as CKAN. Moreover, we intend to 
give to the attendees a big picture from the workshop day and to mainly 
discuss and fill the topics highlighted in the Knowledge Extraction 
Wikipedia page. Participants are also encouraged to extend the Wikipedia 
page.

Submissions
All papers must represent original and unpublished work that is not 
currently under review. Papers will be evaluated according to their 
significance, originality, technical content, style, clarity, and 
relevance to the workshop. At least one author of each accepted paper is 
expected to attend the workshop. Accepted papers will be published 
through CEUR-WS.

We welcome the following types of contributions:

• Full research papers (up to 12 pages).
• Position papers (up to 6 pages)
• Use case descriptions (up to 6 pages)
• Data/benchmark papers (2-6 pages, depending on the size and complexity)

Formatting Guidelines

All submissions must be written in English and must be formatted 
according to the style for Lecture Notes in Computer Science (LNCS) 
Authors. Please submit your contributions electronically in PDF format 
to https://www.easychair.org/conferences/?conf=nlpdbpedia2014

For details on the LNCS style, see the Springer Author Instructions at 
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0. NLP & 
DBpedia 2014 submissions are not anonymous.

Important Dates

- submission date: 7 July, 2014, 23:59 Hawaii time
- author notifications: July 30, 2014, 23:59 Hawaii time
- camera-ready: August 20, 2014, 23:59 Hawaii time
- NLP & DBpedia 2014: October 19 or 20, 2014

Organizing committee
• Heiko Paulheim, University of Mannheim
• Marieke van Erp VU University Amsterdam
• Agata Filipowska, Poznan University of Economics and I2G, Poznan
• Pablo N. Mendes, IBM Research, USA

Program committee
• Guadalupe Aguado, Universidad Politécnica de Madrid, Spain
• Christian Bizer, Universität Mannheim, Germany
• Volha Bryl, Universität Mannheim, Germany
• Martin Brümmer, Universität Leipzig, Germany
• Paul Buitelaar, DERI, National University of Ireland, Galway
• Philipp Cimiano, CITEC, Universität Bielefeld, Germany
• Jorge Gracia, Universidad Politécnica de Madrid, Spain
• Sebastian Hellmann, DBpedia Association, Germany
• Anja Jentzsch, Hasso-Plattner-Institut, Potsdam, Germany
• Dimitris Kontokostas, Universität Leipzig, Germany
• John McCrae, Universität Bielefeld, Germany
• Roberto Navigli, Sapienza, Università di Roma, Italy
• Simone Paolo Ponzetto, University of Mannheim
• Giuseppe Rizzo, Università di Torino, Italy
• Felix Sasaki, Deutsches Forschungszentrum für künstliche Intelligenz, 
Germany
• Ricardo Usbeck, AKSW, Universität Leipzig, Germany
• Rupert Westenthaler, Salzburg Research, Austria
• Feiyu Xu, Deutsches Forschungszentrum für künstliche Intelligenz, Germany

-- 
Dr. Heiko Paulheim
Research Group Data and Web Science
University of Mannheim
Phone: +49 621 181 2646
B6, 26, Room C1.08
D-68159 Mannheim

Mail: heiko at informatik.uni-mannheim.de
Web: www.heikopaulheim.com