[open-linguistics] ANN: NLP Interchange Format (NIF) 1.0 Spec, Demo and Reference Implementation

Mon Nov 28 07:34:49 UTC 2011

The Natural Language Processing Interchange Format (NIF) is an 
RDF/OWL-based format that aims to achieve interoperability between 
Natural Language Processing (NLP) tools, language resources and 
annotations. The core of NIF consists of a vocabulary, which can 
represent Strings as RDF resources. A special URI Design is used to 
pinpoint annotations to a part of a document. These URIs can then be 
used to attach arbitrary annotations to the respective character 
sequence. Employing these URIs, annotations can be published on the Web 
as Linked Data and interchanged between different NLP tools and 
applications.

In order to simplify the combination of tools, improve their 
interoperability and facilitating the use of Linked Data we developed 
the NLP Interchange Format (NIF). NIF addresses the interoperability 
problem on three layers: the structural, conceptual and access layer. 
NIF is based on a Linked Data enabled URI scheme for identifying 
elements in (hyper-) texts (structural layer) and a comprehensive 
ontology for describing common NLP terms and concepts (conceptual 
layer). NIF-aware applications will produce output (and possibly also 
consume input) adhering to the NIF ontology as REST services (access 
layer). Other than more centralized solutions such as UIMA and GATE, NIF 
enables the creation of heterogeneous, distributed and loosely coupled 
NLP applications, which use the Web as an integration platform. Another 
benefit is, that a NIF wrapper has to be only created once for a 
particular tool, but enables the tool to interoperate with a potentially 
large number of other
tools without additional adaptations. Ultimately, we envision an 
ecosystem of NLP tools and services to emerge using NIF for exchanging 
and integrating rich annotations.

We designed NIF to be very light-weight and to reduce the amount of 
triples to achieve better scalability. The following triples in N3 
Syntax express that the string “W3C” on 
http://www.w3.org/DesignIssues/LinkedData.html (index 22849 to 22852) is 
linked to the DBpedia resource of “World_Wide_Web_Consortium”:

@prefix ld: <http://www.w3.org/DesignIssues/LinkedData.html#> .
@prefix str: <http://nlp2rdf.lod2.eu/schema/string/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix scms: <http://ns.aksw.org/scms/> .
@prefix nerd: <http://nerd.eurecom.fr/ontology#> .
ld:offset_22849_22852_W3C str:anchorOf "W3C" .
ld:offset_22849_22852_W3C scms:means dbpedia:World_Wide_Web_Consortium .
ld:offset_22849_22852_W3C a dbo:Organisation , nerd:Organization .

NIF already incorporates the Ontologies of Linguistic Annotation (OLiA, 
http://nachhalt.sfb632.uni-potsdam.de/owl/) and the Named Entity 
Recognition and Disambiguation (NERD, http://nerd.eurecom.fr/ontology/) 
ontology. Please get in contact, if you know of further NLP ontologies, 
which we can reuse and integrate in NIF.

This release consists of the following items:
1. The specification of NIF 1.0 ( http://nlp2rdf.org/nif-1-0 ) This 
document will guide the further implementation of NIF-enabled services. 
An average wrapper requires around 200-500 lines of code. The spec 
integrates several domain ontologies (OLiA, NERD) and will be extended 
in the future to cover more domains.
2. A community portal ( http://nlp2rdf.org )
-- mailing list (nlp2rdf at lists.informatik.uni-leipzig.de ) - 
http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf
-- Read how to get involved (http://nlp2rdf.org/get-involved )
3. A reference implementations of NIF 1.0 in Java
-- Release 1.2 ( 
http://code.google.com/p/nlp2rdf/downloads/detail?name=nlp2rdf-1.2.tar.gz )
-- Source code ( http://code.google.com/p/nlp2rdf/ )
4. Wrapper implementations for Stanford CoreNLP, SnowballStemmer, 
OpenNLP, MontyLingua, DBpedia Spotlight, UIMA, Gate (for ANNIE and also 
generic output), Mallet (alpha)
-- Demo GUI (with links to implementations): http://nlp2rdf.lod2.eu/demo.php
-- List of implementations: http://nlp2rdf.org/implementations
5. Tutorials and Tutorial Challenges ( 
http://nlp2rdf.org/tutorials-challenge )
-- Tutorial: How to call a NIF web service with your favorite SemWeb 
library - 
http://nlp2rdf.org/tutorials/tutorial-how-to-call-a-nif-webservice-with-your-favorite-semweb-library
-- Tutorial Challenge: Semantic Search - 
http://nlp2rdf.org/tutorial-challenges/tutorial-challenge-semantic-search/
-- Tutorial Challenge: Multilingual Part-Of-Speech Tagger - 
http://nlp2rdf.org/tutorial-challenges/tutorial-challenge-multilingual-part-of-speech-tagger
-- Tutorial Challenge: Semantic Yellow Pages - 
http://nlp2rdf.org/tutorial-challenges/tutorial-challenge-semantic-yellow-pages
6. Slides - http://www.slideshare.net/kurzum/nif-version-10
7. A technical report http://svn.aksw.org/papers/2012/WWW_NIF/public.pdf 
including some evaluation.

We would like to thank our colleagues from AKSW (http://aksw.org) 
research group and the LOD2 (http://lod2.eu) project for their helpful 
comments and inspiring discussions during the development of NIF. 
Especially, we would like to thank Christian Chiarcos 
(http://www.sfb632.uni-potsdam.de/~chiarcos/) for his support while 
using OLiA, the members of the Working Group on Open Data in Linguistics 
(http://linguistics.okfn.org/) and the students that participated in the 
NIF field study: Markus Ackermann, Martin Brümmer, Didier Cherix, Marcus 
Nitzschke, Robert Schulze.

Regards,
Sebastian Hellmann, Jens Lehmann and Sören Auer