[open-linguistics] linguistic relevance

Sun Feb 22 13:43:11 UTC 2015

Dear all,

together with Bettina, we provide a formulation for a preliminary  
consensus on linguistic relevance under  
http://wiki.okfn.org/Wg/linguistics/llod-categories#Preliminary_Consensus  
based on the discussion lat Wednesday.

Since then, Jonathan added another suggestion, and I incorporated parts of  
this into the earlier consensus formulation, marked as [addition1] and  
[Condition2].

However, Jonathan's suggestion aims to provide an extensional criterion  
for "linguistic resources (in a strict sense)" by focusing on annotated or  
analyzed data. In opposition to language data, this makes perfect sense,  
but in a broader context, it may actually be too narrow. What about  
linguistic resources that do not contain primary data at all? This is the  
case, e.g., for linguistic metadata collections, for standoff-annotations  
(where annotations may be a resource physically separated from its primary  
data), masked corpora  
(http://u-002-ssfbv001.uni-tuebingen.de/sfb441/c2/paper3/paper3.html), or  
typological databases. We could list these individually, but we might need  
to extend this list from time to time: For example, we currently don't  
have any psycholinguist among us, but I could imagine that they have even  
further types of linguistic (not language) resources one would like to  
include at some point -- but of a kind that most of us would not  
immediately think of (experimental stimuli?, analyzed sensor data?). So,  
that might not be the most sustainable definition.

Personally, I would like to stick with the definition of linguistic  
resources in a strict sense based on the intention of their creator, i.e.,  
his/her specialization, affiliation or associated  
presentations/publications. This provides a very clear-cut, objective  
criterion.

Obviously, this criterion is far too strict, but this should not be a  
problem, we can easily include any resource we're interested in by  
creating links with other LLOD resources, and if someone does, this would  
very clearly demonstrate that it is a linguistic(ally relevant) resource.  
Without links, it cannot be included in the LLOD cloud anyway (because it  
violates the LOD criteria), so this should be an obstacle, even without  
publications or demonstratable specialization of its creator.

At the same time, it might encourage potential contributors to publish  
data set descriptions or case studies using this data at conferences or  
workshops. These will always have a place in the LDL and MLODE workshops,  
so, a dedicated publication channel would be already available, and as  
these workshops are major activities of the OWLG along with LLOD cloud  
development, it provides some degree of internal coherence between both  
efforts.

Best,
Christian
-- 
Prof. Dr. Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany

office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931