[open-linguistics] How to represent LLOD diagram categories at datahub ?

Sun Oct 6 14:05:39 UTC 2013

> Let a thousand ontologies blossom!

Well, this would be easily possible with a coloring based on tags rather  
than a custom feature. This is one reason I suggested a tag-based approach  
before. Everyone ok with that?

Alternative colorings may be useful, e.g., for the language (family) a  
resource refers to, or its modality, or the creator/maintainer, etc., and  
the new script John and me are developing should be easily adaptable for  
this purpose.

But one dimension should be the types of resources, because this is a  
perfect marketing instrument if we're talking to the respective  
communities (say, lexicon people, NLP people or typologists), and it  
underlines the multi-disciplinarity of the group. Hence, I would strongly  
prefer to stay with a resource type classification in the "official"  
diagram. Also, it should be relatively intuitive, relatively balanced and  
use a small set of colors, as Sebastian (H) wrote.

I agree with Sebastian (N) that language_description is almost a misnomer,  
we used it out of the proposal to take olac as an orientation (which is a  
good idea in general). I don't see any existing proposal we could follow  
on the subclassification of these other resources, but we can easily  
classify them according to what kind of information they provide:

i) information about (features of) languages [in its entirety, not a  
particular text, this includes typological databases]
ii) information about specific language resources [excluding the data  
itself, this includes bibliographies]
iii) information used to describe language and resources [e.g., linguistic  
terminology, language identifiers; not tied to any specific data]

We may add to these
iv) information about the linguistic structure of longer, continuous  
stretches of primary data, e.g., a text [may include the primary data]
v) information about semantic structures and entities [more or less  
independent from any particular text, this includes wordnets and lexicons]

(iv) and (v) are "lexicon" and "corpus",
for (iii), I would suggest the term "terminology",
(ii) may be "resource metadata" [in order to generalize over  
"bibliography"],
for (i) I don't have a strong intuition, maybe "language_description"  
would be inappropriate in this case, even if typological databses are more  
or less tabular data, they still represent a selected aspect of the  
grammar of languages.

Is there any kind of resource we missed ?

In any case, how to classify a resource is up to its creator (or whomever  
maintains the metadata entry at datahub), and using multiple tags at the  
same time is never a problem. When drawing the diagram, however, we need  
to define a selection preference in case multiple categories are  
applicable. A very objective way to do this would be the following:

i) use the category with the lowest number of bubbles in the diagram at  
the moment
ii) if there is a tie, follow the lexicographic order of category names

Ideas ?
Christian
-- 
Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany

office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931