[open-linguistics] How to represent LLOD diagram categories at datahub ?
Christian Chiarcos
christian.chiarcos at web.de
Sun Oct 6 14:05:39 UTC 2013
> Let a thousand ontologies blossom!
Well, this would be easily possible with a coloring based on tags rather
than a custom feature. This is one reason I suggested a tag-based approach
before. Everyone ok with that?
Alternative colorings may be useful, e.g., for the language (family) a
resource refers to, or its modality, or the creator/maintainer, etc., and
the new script John and me are developing should be easily adaptable for
this purpose.
But one dimension should be the types of resources, because this is a
perfect marketing instrument if we're talking to the respective
communities (say, lexicon people, NLP people or typologists), and it
underlines the multi-disciplinarity of the group. Hence, I would strongly
prefer to stay with a resource type classification in the "official"
diagram. Also, it should be relatively intuitive, relatively balanced and
use a small set of colors, as Sebastian (H) wrote.
I agree with Sebastian (N) that language_description is almost a misnomer,
we used it out of the proposal to take olac as an orientation (which is a
good idea in general). I don't see any existing proposal we could follow
on the subclassification of these other resources, but we can easily
classify them according to what kind of information they provide:
i) information about (features of) languages [in its entirety, not a
particular text, this includes typological databases]
ii) information about specific language resources [excluding the data
itself, this includes bibliographies]
iii) information used to describe language and resources [e.g., linguistic
terminology, language identifiers; not tied to any specific data]
We may add to these
iv) information about the linguistic structure of longer, continuous
stretches of primary data, e.g., a text [may include the primary data]
v) information about semantic structures and entities [more or less
independent from any particular text, this includes wordnets and lexicons]
(iv) and (v) are "lexicon" and "corpus",
for (iii), I would suggest the term "terminology",
(ii) may be "resource metadata" [in order to generalize over
"bibliography"],
for (i) I don't have a strong intuition, maybe "language_description"
would be inappropriate in this case, even if typological databses are more
or less tabular data, they still represent a selected aspect of the
grammar of languages.
Is there any kind of resource we missed ?
In any case, how to classify a resource is up to its creator (or whomever
maintains the metadata entry at datahub), and using multiple tags at the
same time is never a problem. When drawing the diagram, however, we need
to define a selection preference in case multiple categories are
applicable. A very objective way to do this would be the following:
i) use the category with the lowest number of bubbles in the diagram at
the moment
ii) if there is a tie, follow the lexicographic order of category names
Ideas ?
Christian
--
Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany
office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931
More information about the open-linguistics
mailing list