[open-linguistics] linguistic relevance
Christian Chiarcos
chiarcos at informatik.uni-frankfurt.de
Sun Feb 22 13:43:11 UTC 2015
Dear all,
together with Bettina, we provide a formulation for a preliminary
consensus on linguistic relevance under
http://wiki.okfn.org/Wg/linguistics/llod-categories#Preliminary_Consensus
based on the discussion lat Wednesday.
Since then, Jonathan added another suggestion, and I incorporated parts of
this into the earlier consensus formulation, marked as [addition1] and
[Condition2].
However, Jonathan's suggestion aims to provide an extensional criterion
for "linguistic resources (in a strict sense)" by focusing on annotated or
analyzed data. In opposition to language data, this makes perfect sense,
but in a broader context, it may actually be too narrow. What about
linguistic resources that do not contain primary data at all? This is the
case, e.g., for linguistic metadata collections, for standoff-annotations
(where annotations may be a resource physically separated from its primary
data), masked corpora
(http://u-002-ssfbv001.uni-tuebingen.de/sfb441/c2/paper3/paper3.html), or
typological databases. We could list these individually, but we might need
to extend this list from time to time: For example, we currently don't
have any psycholinguist among us, but I could imagine that they have even
further types of linguistic (not language) resources one would like to
include at some point -- but of a kind that most of us would not
immediately think of (experimental stimuli?, analyzed sensor data?). So,
that might not be the most sustainable definition.
Personally, I would like to stick with the definition of linguistic
resources in a strict sense based on the intention of their creator, i.e.,
his/her specialization, affiliation or associated
presentations/publications. This provides a very clear-cut, objective
criterion.
Obviously, this criterion is far too strict, but this should not be a
problem, we can easily include any resource we're interested in by
creating links with other LLOD resources, and if someone does, this would
very clearly demonstrate that it is a linguistic(ally relevant) resource.
Without links, it cannot be included in the LLOD cloud anyway (because it
violates the LOD criteria), so this should be an obstacle, even without
publications or demonstratable specialization of its creator.
At the same time, it might encourage potential contributors to publish
data set descriptions or case studies using this data at conferences or
workshops. These will always have a place in the LDL and MLODE workshops,
so, a dedicated publication channel would be already available, and as
these workshops are major activities of the OWLG along with LLOD cloud
development, it provides some degree of internal coherence between both
efforts.
Best,
Christian
--
Prof. Dr. Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany
office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931
More information about the open-linguistics
mailing list