[open-linguistics] ANN: NLP Interchange Format (NIF) 1.0 Spec, Demo and Reference Implementation

Sebastian Hellmann hellmann at informatik.uni-leipzig.de
Tue Nov 29 11:29:15 UTC 2011


Hi Emily,
( I just finished this mail while Christian's answer arrived, so I have 
not yet read Christian's answer.

NIF was developed for the Web and the Semantic Web community. The first 
major difference is that it is designed to be *really* simple to 
implement and use.  We did a field study with several students, who 
implemented 6 NIF adapters. They needed 24.3 hours per wrapper (this 
includes getting acquainted with the tool and with NIF and the actual 
implementation time) on average and they used less than 400 lines of 
code.  We are even thinking about simplifying it further. This triple 
here is already a complete,  valid and understandable annotation:|
||
<http://www.w3.org/DesignIssues/LinkedData.html#||offset_22849_22852_W3C> ||<http://ns.aksw.org/scms/||means> 
||<http://dbpedia.org/resource/||World_Wide_Web_Consortium> .
|
There is no further information needed.  The triple links a part of  
the  web document (character |22849 to 22852|) to DBpedia.
NIF goes in the direction of Web Annotation[1] and the provided URIs can 
be used in a way that people can select text in a browser and then do a 
right-click and add comments or annotations (see also the ReframeIt demo 
[2])

On the other hand, it also works the other way round to model provenance:
|
||||||||<http://dbpedia.org/resource/||World_Wide_Web_Consortium> 
||<http://somevocab.org/mentionedIn||>||||<http://www.w3.org/DesignIssues/LinkedData.html#||offset_22849_22852_W3C>||.

|
NIF does not aim to model linguistic annotations "correctly" as GrAF 
tries to do. In NIF only one truth can be represented (no alternative 
annotations). Annotations are also mingled together and not strictly 
separated.
I.e. you could query for all str:Strings, that are olia:CommonNouns and 
dbp-owl:Persons at the same time. "Correctly" modelled,  String,  Noun 
and Person would need to be disjoint.
NIF allows this as it makes an easy and fast (SPARQL) query and it is 
sufficient to cover many use cases on the Web.

I have no complete overview of GrAF, but NIF also reuses existing 
ontologies such as OLiA[4], NERD[3] ( in the future maybe lemon[5] and 
others).
We call this conceptual interoperability [6]  and it is mandatory for 
tools (servers) that produce NIF to disambiguate tags with the help of a 
reference ontology:
Compare " http://purl.org/olia/penn.owl#JJ " vs  just "JJ"  as annotation.

So the approaches are quite complementary I think. It would be perfect, 
if there was some way to have transformations between GrAF and NIF.
All the best,
Sebastian


[1]  http://en.wikipedia.org/wiki/Web_annotation
[2]  http://reframeit.com/documents/london-homeless-law
[3] http://www.eurecom.fr/util/publidownload.en.htm?id=3515
[4] http://nachhalt.sfb632.uni-potsdam.de/owl/
[5] http://lexinfo.net/
[6] http://nlp2rdf.org/nif-1-0#toc-conceptual-interoperability
|

|

On 11/28/2011 09:59 PM, Emily M. Bender wrote:
> Dear Sebastian,
>
> How does NIF relate to/compare to GrAF?
>
> Emily
>
> On Sun, Nov 27, 2011 at 11:34 PM, Sebastian Hellmann
> <hellmann at informatik.uni-leipzig.de>  wrote:
>> The Natural Language Processing Interchange Format (NIF) is an RDF/OWL-based
>> format that aims to achieve interoperability between Natural Language
>> Processing (NLP) tools, language resources and annotations. The core of NIF
>> consists of a vocabulary, which can represent Strings as RDF resources. A
>> special URI Design is used to pinpoint annotations to a part of a document.
>> These URIs can then be used to attach arbitrary annotations to the
>> respective character sequence. Employing these URIs, annotations can be
>> published on the Web as Linked Data and interchanged between different NLP
>> tools and applications.
>>
>> In order to simplify the combination of tools, improve their
>> interoperability and facilitating the use of Linked Data we developed the
>> NLP Interchange Format (NIF). NIF addresses the interoperability problem on
>> three layers: the structural, conceptual and access layer. NIF is based on a
>> Linked Data enabled URI scheme for identifying elements in (hyper-) texts
>> (structural layer) and a comprehensive ontology for describing common NLP
>> terms and concepts (conceptual layer). NIF-aware applications will produce
>> output (and possibly also consume input) adhering to the NIF ontology as
>> REST services (access layer). Other than more centralized solutions such as
>> UIMA and GATE, NIF enables the creation of heterogeneous, distributed and
>> loosely coupled NLP applications, which use the Web as an integration
>> platform. Another benefit is, that a NIF wrapper has to be only created once
>> for a particular tool, but enables the tool to interoperate with a
>> potentially large number of other
>> tools without additional adaptations. Ultimately, we envision an ecosystem
>> of NLP tools and services to emerge using NIF for exchanging and integrating
>> rich annotations.
>>
>> We designed NIF to be very light-weight and to reduce the amount of triples
>> to achieve better scalability. The following triples in N3 Syntax express
>> that the string “W3C” on http://www.w3.org/DesignIssues/LinkedData.html
>> (index 22849 to 22852) is linked to the DBpedia resource of
>> “World_Wide_Web_Consortium”:
>>
>> @prefix ld:<http://www.w3.org/DesignIssues/LinkedData.html#>  .
>> @prefix str:<http://nlp2rdf.lod2.eu/schema/string/>  .
>> @prefix dbo:<http://dbpedia.org/ontology/>  .
>> @prefix scms:<http://ns.aksw.org/scms/>  .
>> @prefix nerd:<http://nerd.eurecom.fr/ontology#>  .
>> ld:offset_22849_22852_W3C str:anchorOf "W3C" .
>> ld:offset_22849_22852_W3C scms:means dbpedia:World_Wide_Web_Consortium .
>> ld:offset_22849_22852_W3C a dbo:Organisation , nerd:Organization .
>>
>> NIF already incorporates the Ontologies of Linguistic Annotation (OLiA,
>> http://nachhalt.sfb632.uni-potsdam.de/owl/) and the Named Entity Recognition
>> and Disambiguation (NERD, http://nerd.eurecom.fr/ontology/) ontology. Please
>> get in contact, if you know of further NLP ontologies, which we can reuse
>> and integrate in NIF.
>>
>> This release consists of the following items:
>> 1. The specification of NIF 1.0 ( http://nlp2rdf.org/nif-1-0 ) This document
>> will guide the further implementation of NIF-enabled services. An average
>> wrapper requires around 200-500 lines of code. The spec integrates several
>> domain ontologies (OLiA, NERD) and will be extended in the future to cover
>> more domains.
>> 2. A community portal ( http://nlp2rdf.org )
>> -- mailing list (nlp2rdf at lists.informatik.uni-leipzig.de ) -
>> http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf
>> -- Read how to get involved (http://nlp2rdf.org/get-involved )
>> 3. A reference implementations of NIF 1.0 in Java
>> -- Release 1.2 (
>> http://code.google.com/p/nlp2rdf/downloads/detail?name=nlp2rdf-1.2.tar.gz )
>> -- Source code ( http://code.google.com/p/nlp2rdf/ )
>> 4. Wrapper implementations for Stanford CoreNLP, SnowballStemmer, OpenNLP,
>> MontyLingua, DBpedia Spotlight, UIMA, Gate (for ANNIE and also generic
>> output), Mallet (alpha)
>> -- Demo GUI (with links to implementations): http://nlp2rdf.lod2.eu/demo.php
>> -- List of implementations: http://nlp2rdf.org/implementations
>> 5. Tutorials and Tutorial Challenges (
>> http://nlp2rdf.org/tutorials-challenge )
>> -- Tutorial: How to call a NIF web service with your favorite SemWeb library
>> -
>> http://nlp2rdf.org/tutorials/tutorial-how-to-call-a-nif-webservice-with-your-favorite-semweb-library
>> -- Tutorial Challenge: Semantic Search -
>> http://nlp2rdf.org/tutorial-challenges/tutorial-challenge-semantic-search/
>> -- Tutorial Challenge: Multilingual Part-Of-Speech Tagger -
>> http://nlp2rdf.org/tutorial-challenges/tutorial-challenge-multilingual-part-of-speech-tagger
>> -- Tutorial Challenge: Semantic Yellow Pages -
>> http://nlp2rdf.org/tutorial-challenges/tutorial-challenge-semantic-yellow-pages
>> 6. Slides - http://www.slideshare.net/kurzum/nif-version-10
>> 7. A technical report http://svn.aksw.org/papers/2012/WWW_NIF/public.pdf
>> including some evaluation.
>>
>> We would like to thank our colleagues from AKSW (http://aksw.org) research
>> group and the LOD2 (http://lod2.eu) project for their helpful comments and
>> inspiring discussions during the development of NIF. Especially, we would
>> like to thank Christian Chiarcos
>> (http://www.sfb632.uni-potsdam.de/~chiarcos/) for his support while using
>> OLiA, the members of the Working Group on Open Data in Linguistics
>> (http://linguistics.okfn.org/) and the students that participated in the NIF
>> field study: Markus Ackermann, Martin Brümmer, Didier Cherix, Marcus
>> Nitzschke, Robert Schulze.
>>
>> Regards,
>> Sebastian Hellmann, Jens Lehmann and Sören Auer
>>
>> _______________________________________________
>> open-linguistics mailing list
>> open-linguistics at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-linguistics
>>
>
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20111129/557df4ab/attachment-0001.html>


More information about the open-linguistics mailing list