[open-linguistics] ANN: NLP Interchange Format (NIF) 1.0 Spec, Demo and Reference Implementation

Sebastian Hellmann hellmann at informatik.uni-leipzig.de
Tue Nov 29 15:00:42 UTC 2011


Yes, Emily sent it with another email address, I guess.
I had to manually confirm it today, but the date of the email is 
yesterday(11/28/2011 09:59 PM), so it should be sorted accordingly in 
your email client, albeit pushed down by intermittent emails.
Sebastian



On 11/29/2011 03:50 PM, Nancy Ide wrote:
> Hi Christian and all,
>
> Thanks for this nice and extensive explanation! There are a few comments I would make (some because there are some misconceptions and inaccuracies about GrAF, as there have been changes that are not yet published), but maybe this can wait until March at the workshop. I also think there are some "big picture" notions missing here, and as a result, I've decided to change the topic of my invited talk in March to address this question of relationship among various schemes and approaches, which I hope will be of interest (and spark a lot of discussion I'm sure!).
>
> BTW I did not receive the original query from Emily--was it on this list?
>
> Best,
> Nancy
>
>
>
> On Nov 29, 2011, at 5:46 AM, Christian Chiarcos wrote:
>
>> Dear Emily,
>>
>> (I'm answering for Sebastian, because we're working together on scaling up
>> NIF to a more generic formalism.)
>>
>> The key difference is a difference of technologies employed: the standard
>> linearization of GrAF is by means of standoff XML (with known
>> disadvantages such as limited readability and the need to develop an
>> infrastructure [API, data bases, query language, etc.] from scratch),
>> whereas NIF uses RDF and OWL (not necessarily better readable, but with
>> infrastructures provided by the Semantic Web community). Also, both differ
>> in scope: The goal behing NIF was to develop NLP pipelines that make use
>> of RDF as interchange format, whereas GrAF is primarily concerned with
>> modeling linguistic corpora. Further, GrAF allows only to model a single
>> resource, RDF also allows to interlink it with other resources (e.g., for
>> PropBanking or parallel corpora). Both approaches augment each other, and
>> it is possible that they converge in the future to a certain extent -
>> maybe, this could be a topic to discuss at the LDL where we invited Nancy
>> Ide as keynote speaker.
>>
>> On the more detailed level, NIF is a relatively pragmatic approach to
>> represent the output of several typical NLP tools, i.e., tokenizers,
>> lemmatizers, POS taggers, and constituent parsers, by means of RDF.
>> (Higher-level annotations, say, alignment, bridging, or discourse
>> structure, cannot be adequately represented by NIF at the moment.) As
>> compared to this, GrAF was developed as a means to represent any kind of
>> linguistic annotation, and in particular, annotated corpora. So, it comes
>> with a certain theoretical overhead. GrAF is based on a graph-theoretic
>> model and just uses labeled directed graphs as data model (without any
>> more specific constraints on the data model). Of course, RDF can also be
>> understood as being based on labeled directed graphs. In that sense, both
>> are building on similar conceptions of linguistic annotations as directed
>> (acyclic) (hyper-) graphs, as others did before (cf. Bird&  Liberman 2001).
>>
>> NIF comprises means to address strings (this roughly corresponds to the
>> segmentation in LAF, which is, however, not directly a part of GrAF [Ide&
>> Suderman 2007 describe the introduction of "dummy nodes" which do not seem
>> to be proper nodes in the GrAF graph]). Differently from GrAF, these
>> strings are addressed by URIs, so it allows references from other
>> resources directly into a corpus/document.
>>
>> Further, NIF contains the String Ontology and the Structured Sentence
>> Ontology that provide data types for annotation, but from a very
>> surface-oriented perspective (Sentences, Phrases and Words as defined in
>> the Structured Sentence Ontology serve as units of annotation, all would
>> be specializations of a GrAF node; a String as defined in the String
>> Ontology, however, would be more general than a GrAF node, because a GrAF
>> node is - implicitly - required to be a meaningful unit of linguistic
>> annotation, whereas a string is just a sequence of characters.) One
>> important difference is that a GrAF node can be discontinuous, whereas a
>> NIF String is continuous.
>>
>> An important practical difference is that NIF allows to handle structural
>> interoperability (comparable formats) for linguistic annotations, but also
>> conceptual interoperability (interpretable annotations), *within the same
>> formalism*. In GrAF, ISOcat can be used to define reference categories,
>> but at the moment, the integration of ISOcat in GrAF does not seem to be
>> solved. (Although there may have been recent developments that I am not
>> aware of.) In NIF, the OLiA ontologies can be used which provide an OWL
>> wrapper for ISOcat (and other terminology repositories). Same for metadata
>> (using lexvo/lingvoj).
>>
>> So far, NIF covers only selected levels of annotation, for higher levels,
>> additional data types would have to be introduced, and these will be based
>> on the LAF, either directly on GrAF, or on its sibling format PAULA
>> (originating from early LAF sketches, Ide&  Romary 2004). At least two
>> people (Nancy Ide and myself) wrote earlier on the list that they could
>> provide RDF representations of such generic formalisms, and this
>> information will be the input to carry the development of NIF further.
>>
>> In the longer perspective, we plan to develop means to represent
>> linguistic annotations such that GrAF annotations can be rendered
>> appropriately in RDF and/or OWL, with NIF for NLP pipelines and POWLA for
>> linguistic corpora.
>>
>> Best,
>> Christian
>>
>> On Mon, 28 Nov 2011 21:59:51 +0100, Emily M. Bender<ebender at uw.edu>  wrote:
>>
>>> Dear Sebastian,
>>>
>>> How does NIF relate to/compare to GrAF?
>>>
>>> Emily
>>>
>>> On Sun, Nov 27, 2011 at 11:34 PM, Sebastian Hellmann
>>> <hellmann at informatik.uni-leipzig.de>  wrote:
>>>> The Natural Language Processing Interchange Format (NIF) is an RDF/OWL-based
>>>> format that aims to achieve interoperability between Natural Language
>>>> Processing (NLP) tools, language resources and annotations. The core of NIF
>>>> consists of a vocabulary, which can represent Strings as RDF resources. A
>>>> special URI Design is used to pinpoint annotations to a part of a document.
>>>> These URIs can then be used to attach arbitrary annotations to the
>>>> respective character sequence. Employing these URIs, annotations can be
>>>> published on the Web as Linked Data and interchanged between different NLP
>>>> tools and applications.
>>>>
>>>> In order to simplify the combination of tools, improve their
>>>> interoperability and facilitating the use of Linked Data we developed the
>>>> NLP Interchange Format (NIF). NIF addresses the interoperability problem on
>>>> three layers: the structural, conceptual and access layer. NIF is based on a
>>>> Linked Data enabled URI scheme for identifying elements in (hyper-) texts
>>>> (structural layer) and a comprehensive ontology for describing common NLP
>>>> terms and concepts (conceptual layer). NIF-aware applications will produce
>>>> output (and possibly also consume input) adhering to the NIF ontology as
>>>> REST services (access layer). Other than more centralized solutions such as
>>>> UIMA and GATE, NIF enables the creation of heterogeneous, distributed and
>>>> loosely coupled NLP applications, which use the Web as an integration
>>>> platform. Another benefit is, that a NIF wrapper has to be only created once
>>>> for a particular tool, but enables the tool to interoperate with a
>>>> potentially large number of other
>>>> tools without additional adaptations. Ultimately, we envision an ecosystem
>>>> of NLP tools and services to emerge using NIF for exchanging and integrating
>>>> rich annotations.
>>>>
>>>> We designed NIF to be very light-weight and to reduce the amount of triples
>>>> to achieve better scalability. The following triples in N3 Syntax express
>>>> that the string “W3C” on http://www.w3.org/DesignIssues/LinkedData.html
>>>> (index 22849 to 22852) is linked to the DBpedia resource of
>>>> “World_Wide_Web_Consortium”:
>>>>
>>>> @prefix ld:<http://www.w3.org/DesignIssues/LinkedData.html#>  .
>>>> @prefix str:<http://nlp2rdf.lod2.eu/schema/string/>  .
>>>> @prefix dbo:<http://dbpedia.org/ontology/>  .
>>>> @prefix scms:<http://ns.aksw.org/scms/>  .
>>>> @prefix nerd:<http://nerd.eurecom.fr/ontology#>  .
>>>> ld:offset_22849_22852_W3C str:anchorOf "W3C" .
>>>> ld:offset_22849_22852_W3C scms:means dbpedia:World_Wide_Web_Consortium .
>>>> ld:offset_22849_22852_W3C a dbo:Organisation , nerd:Organization .
>>>>
>>>> NIF already incorporates the Ontologies of Linguistic Annotation (OLiA,
>>>> http://nachhalt.sfb632.uni-potsdam.de/owl/) and the Named Entity Recognition
>>>> and Disambiguation (NERD, http://nerd.eurecom.fr/ontology/) ontology. Please
>>>> get in contact, if you know of further NLP ontologies, which we can reuse
>>>> and integrate in NIF.
>>>>
>>>> This release consists of the following items:
>>>> 1. The specification of NIF 1.0 ( http://nlp2rdf.org/nif-1-0 ) This document
>>>> will guide the further implementation of NIF-enabled services. An average
>>>> wrapper requires around 200-500 lines of code. The spec integrates several
>>>> domain ontologies (OLiA, NERD) and will be extended in the future to cover
>>>> more domains.
>>>> 2. A community portal ( http://nlp2rdf.org )
>>>> -- mailing list (nlp2rdf at lists.informatik.uni-leipzig.de ) -
>>>> http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf
>>>> -- Read how to get involved (http://nlp2rdf.org/get-involved )
>>>> 3. A reference implementations of NIF 1.0 in Java
>>>> -- Release 1.2 (
>>>> http://code.google.com/p/nlp2rdf/downloads/detail?name=nlp2rdf-1.2.tar.gz )
>>>> -- Source code ( http://code.google.com/p/nlp2rdf/ )
>>>> 4. Wrapper implementations for Stanford CoreNLP, SnowballStemmer, OpenNLP,
>>>> MontyLingua, DBpedia Spotlight, UIMA, Gate (for ANNIE and also generic
>>>> output), Mallet (alpha)
>>>> -- Demo GUI (with links to implementations): http://nlp2rdf.lod2.eu/demo.php
>>>> -- List of implementations: http://nlp2rdf.org/implementations
>>>> 5. Tutorials and Tutorial Challenges (
>>>> http://nlp2rdf.org/tutorials-challenge )
>>>> -- Tutorial: How to call a NIF web service with your favorite SemWeb library
>>>> -
>>>> http://nlp2rdf.org/tutorials/tutorial-how-to-call-a-nif-webservice-with-your-favorite-semweb-library
>>>> -- Tutorial Challenge: Semantic Search -
>>>> http://nlp2rdf.org/tutorial-challenges/tutorial-challenge-semantic-search/
>>>> -- Tutorial Challenge: Multilingual Part-Of-Speech Tagger -
>>>> http://nlp2rdf.org/tutorial-challenges/tutorial-challenge-multilingual-part-of-speech-tagger
>>>> -- Tutorial Challenge: Semantic Yellow Pages -
>>>> http://nlp2rdf.org/tutorial-challenges/tutorial-challenge-semantic-yellow-pages
>>>> 6. Slides - http://www.slideshare.net/kurzum/nif-version-10
>>>> 7. A technical report http://svn.aksw.org/papers/2012/WWW_NIF/public.pdf
>>>> including some evaluation.
>>>>
>>>> We would like to thank our colleagues from AKSW (http://aksw.org) research
>>>> group and the LOD2 (http://lod2.eu) project for their helpful comments and
>>>> inspiring discussions during the development of NIF. Especially, we would
>>>> like to thank Christian Chiarcos
>>>> (http://www.sfb632.uni-potsdam.de/~chiarcos/) for his support while using
>>>> OLiA, the members of the Working Group on Open Data in Linguistics
>>>> (http://linguistics.okfn.org/) and the students that participated in the NIF
>>>> field study: Markus Ackermann, Martin Brümmer, Didier Cherix, Marcus
>>>> Nitzschke, Robert Schulze.
>>>>
>>>> Regards,
>>>> Sebastian Hellmann, Jens Lehmann and Sören Auer
>> _______________________________________________
>> open-linguistics mailing list
>> open-linguistics at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-linguistics
>
> _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-linguistics
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org





More information about the open-linguistics mailing list