[open-linguistics] LLOD cloud categories
Christian Chiarcos
christian.chiarcos at web.de
Fri Mar 28 21:13:29 UTC 2014
As mentioned in the last email, Bettina has summarized our discussions and
some feedback from LIDER project members and developed a small ontology of
linguistic categories.
Personally, I think it reflects relatively faithfully what we discussed
before. The main difference as compared to the classification in the
current diagram (LEXICON, LANGUAGE_DESCRIPTION, CORPUS, also see
http://wiki.okfn.org/Llod-categories) is that the "language description"
group is broken up, namely in "linguistic (language) data bases" and
"ontology", and that, in addition, we have an explicit "other" category.
I would, however, suggest to replace the label "ontology" with "linguistic
vocabulary" (i.e. a vocabulary of linguistically relevant terms). This is
because most of our resources (and practically every lexical-semantic
resource) are ontologies in a technical sense.
Furthermore, "Linguistic Data Category" should be labelled "Linguistic
Resource Type".
Beyond these marginal adjustments, I see three potential problems with this
classification:
- The LEXICON group has grown over-proportionally large since the last
diagram. We arrive at a more balanced picture if general knowledge bases
(DBpedia, Yago, Freebase -- unlike lexicons, they do not provide
grammatical, i.e., linguistic information in a strict sense) are singled
out as, say "semantic knowledge bases". This would solve our controversy
as to whether these resources are actually linguistic in nature (they
are certainly linguistically/NLP-relevant).
- The diagram contains bibliographical DBs as linguistically relevant data
sets. They cannot be assigned to category other than "other" but should
probably receive a more consistent treatment. Formerly, these have been
"language description" (because they describe where to locate language
data).
- Splitting the old LANGUAGE_DESCRIPTION (which was relatively small in
the first place) into three sub-categories results in tiny clusters and
thereby marginalizes non-lexical data sets. From a presentational point
of view, this is clearly not desirable.
Any thoughts?
Best,
Christian
--
Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany
office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931
More information about the open-linguistics
mailing list