[open-linguistics] [Corpora-List] WordNet vs Ontology

Sebastian Hellmann hellmann at informatik.uni-leipzig.de
Fri Aug 8 07:35:29 UTC 2014


Dear all,
(I included some more lists to ping them, discussion started here: 
http://mailman.uib.no/public/corpora/2014-August/020939.html)

I see that there are many viewpoints on this issue in this thread.
So let me add my personal biased view.

In the broadest sense, we start to create an ontology by stating facts:

married (a, b) .

Imho we have an ontology, solely for the reason, that we start to relate 
a to b with "married" . Even if there is not an explicit ontology 
defining "married", it is still used in an "ontological" way, just not 
explicit. There are other aspects missing, which have been discussed 
throughout the literature (i.e. the fact that it must be "shared" by 
Gruber), but in the broadest sense, it qualifies.

Regarding language technology and this discussion, I would say that we 
should be careful not to mix levels. This is done by lexical-semantic 
resources, i.e. WordNet, but we could separate it again.

In my view, we have these different layers:

1. the content, i.e. the characters (html, plaintext), e.g in unicode.
2. the container of the content, i.e. document or tweet
3. annotations on the content
4. metadata on the container, e.g. the tweeter or author for context
5. collection of content (with or without annotations) i.e. the corpora
6. ontologies and data describing language, i.e. lexica, dictionaries, 
terminologies, etc. such as WordNet
7. factual databases inluding their taxonomies, i.e. the DBpedia 
knowledge graph http://dbpedia.org

(@John: I hope you are noticing, that I am trying to be keep all of it 
as underspecified as possible)

Then in addition, there are ontologies on a meta-level that try to 
capture all seven layers. Some examples (more below): NIF, lemon, ITS, 
NERD [1]
which we are trying to combine in the http://nlp2rdf.org and 
http://lider-project.eu

We can model WordNet using the lemon ontology: 
http://datahub.io/dataset/lemonwordnet
However for certain purposes, it makes sense to transform WordNet to 
become a taxonomy as YAGO is doing:
https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/

I am not fixed upon any of the definitions I gave above, as I am aware 
that you can and should! transform one in the other (with some effort, 
e.g. corpora to dictionary, fact extraction, language generation).

If we are talking about extracting ontologies from text, there might be 
philosophical people who might want to argue that the ontology is 
already in the text. Discussion can be endless, if you take the wrong 
linguistic turn.

If we are focusing on engineering of information machines, then things 
are much clearer.

All the best,
Sebastian



[1] related to the different layers:
1. NIF: http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#
2. (there is a gap here, Dublin Core or Foaf are not enough imho)
3 a) MARL: http://www.gi2mo.org/marl/0.1/ns.html
    b) ITS: Docu: http://www.w3.org/TR/its20/ , RDF: 
http://www.w3.org/2005/11/its/rdf#
    c) OLIA: http://purl.org/olia/
4. a) Dublin Core: http://dublincore.org/documents/dcmi-terms/
     b) Prov-O: http://www.w3.org/TR/prov-o/
5. also NIF: http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#
6. lemon: http://lemon-model.net/
7. a) DCAT and DataId: http://wiki.dbpedia.org/coop/DataIDUnit
    b) NERD: http://nerd.eurecom.fr/ontology



On 08.08.2014 06:11, John F Sowa wrote:
> On 8/7/2014 10:57 PM, Ken Litkowski wrote:
>> It would seem to me that our goal should be a classification
>> of all existing things (not to exclude the narrower types).
>
> Yes, but note the slides I suggested in my first note:
>
>    http://www.jfsowa.com/talks/kdptut.pdf
>
> Slides 7 to 9:  Cyc project.  30 years of work (since 1984).
> After the first 25 years, 100 million dollars and 1000 person-years
> of work (one person-millennium!), 600,000 concepts, defined by
> 5,000,000 axioms, organized in 6,000 microtheories -- and counting.
>
> Slide 10:  2300 years of universal ontology schemes -- and counting.
>
>> The Brandeis Shallow Ontology attempts to do this, and incidentally
>> is being used to characterize arguments of verbs in Patrick Hanks
>> corpus pattern analysis, i.e., in the imperfect world of language.
>
> I strongly believe in shallow, underspecified ontologies -- especially
> when they're supplemented with lots of lexical information about verbs
> and their characteristic patterns.
>
> But I also believe that the key to having an open-ended variety of
> specialized ontologies is to make the computers do what people do:
> extend their ontologies automatically by reading books.
>
> Lenat made the mistake of assuming that you need to hand-code
> a huge amount of knowledge before a system can start to read
> by itself.  But that's wrong.  You need to design a system that
> can automatically augment its ontology every step of the way.
>
> John
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>


-- 
Sebastian Hellmann
AKSW/NLP2RDF research group
Insitute for Applied Informatics (InfAI) and DBpedia Association
Events:
* *Sept. 1-5, 2014* Conference Week in Leipzig, including
** *Sept 2nd*, MLODE 2014 <http://mlode2014.nlp2rdf.org/>
** *Sept 3rd*, 2nd DBpedia Community Meeting 
<http://wiki.dbpedia.org/meetings/Leipzig2014>
** *Sept 4th-5th*, SEMANTiCS (formerly i-SEMANTICS) <http://semantics.cc/>
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
Thesis:
http://tinyurl.com/sh-thesis-summary
http://tinyurl.com/sh-thesis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20140808/9133efa8/attachment-0002.html>


More information about the open-linguistics mailing list