[open-linguistics] How to represent LLOD diagram categories at datahub ?
klimek at informatik.uni-leipzig.de
Wed Nov 27 10:25:24 UTC 2013
Hi John and Christian,
thanks a lot for contributing to my suggestions. I think we are really
close to come to a final categorization anyone could accept. First of
all, I agree with Christian on working out these test questions to
facilitate the categorisation process. If we could agree on the
definitions as well and integrate them as tool tips or put them
somewhere else on the website that would be great.
Then I like to turn to the ISOcat problem again. Given Christian's
definition of metadata I can agree to take it out of the lexicon
category. In fact the ISOcat categories described are not part of a
language system but define categories to describe elements of language
Your comments revealed that you both really didn't like my category #4
and that there seems to be no difference for you between a lexicon and a
rdf version of a lexicon. All data sets I put into category #4 have in
common that the data they contain is of machine-readable nature. Of
course it is a requirement that all data sets are in RDF in order to be
linked with each other in the cloud. But some data sets (all I put in
category #4) are only RDF (or OWL) data. They differ fundamentally from
data sets containg the original data AND the derived RDF
representations. Let's take the original Princeton wordnet and wordnet
3.0 (VU Amsterdam) for example. To illustrate the difference consider
the following analogy:
Hindi (is to) Hindi Grammar (as) Wordnet (is to) Wordnet 3.0
A grammar of Hindi would never be considered equal to the whole Hindi
language. It is only a description of the language, which shares several
properties with the RDF version of wordnet. Both are linguistic
resources which are derived from some other linguistic source. By doing
so the source data gets changed and is transformed into a more abstract
representation. The grammar and the RDF version have their own
underlying structural principles. These can be applied to describing all
different languages or linguistic resources (respectively for RDF) and
they must not be in the same language as the source data. Let's also
consider that the relation between the two kinds of data is
unidirectional: the Hindi language (and Wordnet) exists
without/independently of the Hindi Grammar (and Wordnet 3.0) but the
Hindi grammar cannot be written without the Hindi language. The same
holds for all RDF versions: they are always a description OF something
and therewith bound to the source data.
I think this is not problematic at all and I don't know why it should be
impossible to classify all the data sets I put in my ontology category
into Christian's meta data category (because there they fit perfectly
according to the definition). All I like to keep apart is the source
data and any data (even meta data) derived from them. That means
exactly: If a data set does contain source data AND meta data it should
be classified according to its source data. This holds for rosetta.org
for example. It provides a language classification but is also working
on an RFD version of it. That means the same data set contains the
source and the meta data. But if meta data is externally in another data
set without giving the source data it should be classified as meta data.
That is why I assigned the Princeton Wordnet data set to the lexicon
category and the Wordnet 3.0 (VU Amsterdam) data set to the meta data
category (former #4). That way we do not have to put the derived data
into Christian's "other" category and keep that reserved for the
"linguistically relevant data sets not directly containing linguistic
This is the only change I would make on Christian's categorization.
If we can agree on taking my #4 data sets out of Christian's category
f)OTHER and add it to Christian's category e)METADATA I could make this
little change quickly and correct it on my categorization spreadsheet.
The colouring of the cloud should not be far from that anymore.
All the best,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the open-linguistics