[open-linguistics] How to represent LLOD diagram categories at datahub ?
Bettina Klimek
bettina.klimek at uni-leipzig.de
Fri Oct 11 20:37:08 UTC 2013
Dear all,
I was thinking about how to categorize the data sets in the LLOD cloud
as well. To me, a classification should be oriented on the people who
are particularly interested in the data: linguists. Therefore it makes
sense to find categories which are broad enough to cover 5-6 category
labels in order to get a first holistic overview over the kinds of
data being in the cloud, and narrow enough to allow for an exhaustive
and unambiguous classification. That way linguists are capable of
finding the kind of data sets they are looking for at one sight and
one could avoid having mixed categories which make it hard to assign
data sets to come to a certain category.
I agree with everyone that the third category ?language description?
is too broad and includes ? as Christian already mentioned ? various
kinds of data, which would create a somewhat fuzzy category. Besides,
linguists have a different understanding of ?language description? as
Sebastian (N) pointed out. The idea of setting up definitions for the
categories seems very useful. In the context of the LLOD cloud
however, I think that reusing already existing definitions is
problematic, because they might not serve the specific needs of the
field of linguistic data. Following Sebastian?s (H) idea, establishing
our own definitions would be a good way of creating a coherent and
homogenous classification of different linguistic data sets. This is
what I tried to establish and what you can see on this Google
document:
https://docs.google.com/document/d/1skUbkYlM5Y6UiettCj7-hImdKandl3TsqFDiVTRlthE/edit?usp=drive_web.
The 5 categories I propose here are well known in the field of
linguistics and might be the kinds of data a linguist might like to
work with. I also introduced some subcategories here, because it is
obvious that there are many more kinds of data and not each data type
can be highlighted with a color in the cloud. I tried to solve this
problem by assuming that these 5 categories could be treated as
default categories to which each data set must be assigned at the
highest level. The subcategories here can be extended and adjusted to
any data set depending on the data it represents. Beyond that the
subcategories are also a means for the people who would like to
contribute their data set to the cloud, because they know best whether
their data is a database or a corpus. Establishing a second category
layer under the default categories will also lead to a finer grained
classification. I do not know if it is possible, but I can imagine
that the subcategories are visualized in a sub-cloud as well. That
means in detail, if someone clicks on the database category for
example a new cloud will be opened showing all data sets which include
databases only and these bubbles could be colored as well according to
the subcategories ?lexical database?, ?typological database? and so
on. That way the data cloud stays really open, because the subcategory
layer can be extended with new types of data sets. At the same time
the main cloud can keep the default categories, because they include
all the subcategories. From a linguist point of view this seems really
useful since a typologist for instance might not want to see all data
sets containing databases but all data sets including only typological
databases.
Up to now this is only a first draft and it will have to be adjusted.
Right now I am going through all data sets in the cloud to find out
what kinds of data exist and if this classification could work out. I
am happy to hear your opinions for improvement.
With kind regards,
Bettina
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
More information about the open-linguistics
mailing list