[open-linguistics] How to represent LLOD diagram categories at datahub ?

hellmann at informatik.uni-leipzig.de hellmann at informatik.uni-leipzig.de
Sun Oct 6 11:37:39 UTC 2013


Let a thousand ontologies blossom!

I am in favor of creating several different colorings with the potential to add your own. 

This can be modelled by:
A dimension/aspect of the coloring.
Then we need an assignment value->color. 

The reason for this is that I would like to make diagrams with different colors. One aspect could be for hosting/boasting purposes, e.g. which institute/company is hosting the data to give credit.

This gives us pretty good features, i.e. we can make a heat map with nr of described languages as dimension. 

Furthermore, I am a big opponent of classifications and a great fan of criteria. One clear criteria is whether the dataset contains primary data. This would qualify it partially as a corpus in my opinion. There are some fringe cases of course, i.e. dictionaries citing sentences from newspapers as example. So based on the 'contains primary data property', corpora could be defined as 'datasets that have primary data and annotations relating to this primary data' 

Lexica 'may or may not contain primary data, but the primary data is an annotation for the main content, i.e. the entries in a dictionary are annotated by newspaper examples.


We probably should discuss it on this level ,i.e. what kind of differently colored clouds do we like to have, what dimensions or aspects do we need and what kind of metadata do we need to collect. 

Other than that I would prefer prettiness as a main criteria for the official LLOD cloud. Let's say 4-6 colors which are pleasing to the eye ;) We probably do not have to make a science out of it and leave it fuzzy for now. 

@Hugh: we should aim at creating a consentual framework for resource classification eventually...

--Sebastian




Sebastian Nordhoff <sebastian_nordhoff at eva.mpg.de> wrote:
>On Sat, 05 Oct 2013 12:26:43 +0200, Christian Chiarcos  
><christian.chiarcos at web.de> wrote:
>
>> Dear all,
>>
>> earlier, we discussed categories for coloring the LLOD diagram. The  
>> diagram we prepared for LDL-2013 was based on a something like the  
>> minimal consensus:
>>
>> - lexicon (= LREMap lexicon, olac:lexicon)
>> - corpus (= LREMap corpus, ~ olac:primary data)
>> - language_description (basically everything else, ~  
>> olac:language_description)
>>
>> I guess the first two are unproblematic, but the third is very  
>> heterogeneous, it includes
>> - terminology repositories
>> - typological databases
>> - bibliographical databases
>> In a way, all of these "describe language" (information about
>languages,  
>> information about concepts relevant to the description of language,  
>> information about collections of language data), but honestly, I
>would  
>> prefer the label "other", because this is very different from what I 
>
>> think an olac:language_description is meant to be.
>
>As far as I can see, a language description would be a (sketch) grammar
>or  
>a learner's manual or similar. I think we have none of those in the
>LLOD  
>cloud (though we might in the future). olac:language_description does
>not  
>seem to be a good choice there.
>
>I agree with Christian that there is not a lot of internal coherence in
> 
>group 3. What would be the reason against having 5 groups, rather than
>3?  
>The typological databases group nicely, and I intend to add some more  
>typological databases over the next months.  Terminology repositories
>can  
>also be grouped. This only leaves Glottolog as the odd one out, and we
>can  
>call it "other".
>
>I suppose we will have to have some labels for groups 3a and 3b, which 
>
>should be dereferenceable. Is there not something like xyz:tabulardata
>for  
>typological databases which we could subclass?
>
>Best
>Sebastian
>
>>
>> Two questions
>> - Is this general classification acceptable ?
>> - How shall we encode the categories ? Using tags "lexicon",
>"corpus",  
>> etc. ? Or using a custom field "LLOD category" ? Unless anyone
>protests,  
>> I would suggest to use tags for "lexicon" and "corpus" and classify  
>> everything without such a tag as "language_description".
>>
>> Best,
>> Christian
>
>_______________________________________________
>open-linguistics mailing list
>open-linguistics at lists.okfn.org
>http://lists.okfn.org/mailman/listinfo/open-linguistics
>Unsubscribe: http://lists.okfn.org/mailman/options/open-linguistics

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20131006/2bfa86d1/attachment-0001.html>


More information about the open-linguistics mailing list