[open-linguistics] How to represent LLOD diagram categories at datahub ?

Mon Oct 7 22:44:52 UTC 2013

> So, I was looking for something more form oriented. I am not sure I have  
> found anything but I did find some interesting discussion in the ORE and  
> FaBiO ontologies.  The developers of the FaBiO, the FRBR-aligned  
> Bibliographic Ontology, discuss part of the point here in this post:  
> http://opencitations.wordpress.com/2011/06/30/nomenclature-for-data-publications-and-citations/.

I found this very interesting, and it might be the case that we can use  
fabio for resources like WALS, which are

"""
fabio:Dataset: “A collection of related facts, often expressed in  
numerical form and encoded in a defined structure.”
"""

(this is of course not really related to corpora, but still valuable for  
OWLG)

Best
Sebastian N


> Though I am sure that this data+annotation pairing also has been  
> addressed in the DNA or medical fields areas of linked data and data  
> storage. Does anyone know what patterns of description are being used in  
> these cases?
>
> For What Its Worth....
>
> - Hugh
>
>
>
> On Oct 6, 2013, at 4:37 AM, hellmann at informatik.uni-leipzig.de wrote:
>
>> Let a thousand ontologies blossom!
>>
>> I am in favor of creating several different colorings with the  
>> potential to add your own.
>>
>> This can be modelled by:
>> A dimension/aspect of the coloring.
>> Then we need an assignment value->color.
>>
>> The reason for this is that I would like to make diagrams with  
>> different colors. One aspect could be for hosting/boasting purposes,  
>> e.g. which institute/company is hosting the data to give credit.
>>
>> This gives us pretty good features, i.e. we can make a heat map with nr  
>> of described languages as dimension.
>>
>> Furthermore, I am a big opponent of classifications and a great fan of  
>> criteria. One clear criteria is whether the dataset contains primary  
>> data. This would qualify it partially as a corpus in my opinion. There  
>> are some fringe cases of course, i.e. dictionaries citing sentences  
>> from newspapers as example. So based on the 'contains primary data  
>> property', corpora could be defined as 'datasets that have primary data  
>> and annotations relating to this primary data'
>>
>> Lexica 'may or may not contain primary data, but the primary data is an  
>> annotation for the main content, i.e. the entries in a dictionary are  
>> annotated by newspaper examples.
>>
>>
>> We probably should discuss it on this level ,i.e. what kind of  
>> differently colored clouds do we like to have, what dimensions or  
>> aspects do we need and what kind of metadata do we need to collect.
>>
>> Other than that I would prefer prettiness as a main criteria for the  
>> official LLOD cloud. Let's say 4-6 colors which are pleasing to the eye  
>> ;) We probably do not have to make a science out of it and leave it  
>> fuzzy for now.
>>
>> @Hugh: we should aim at creating a consentual framework for resource  
>> classification eventually...
>>
>> --Sebastian
>>
>>
>>
>>
>> Sebastian Nordhoff <sebastian_nordhoff at eva.mpg.de> wrote:
>> On Sat, 05 Oct 2013 12:26:43 +0200, Christian Chiarcos
>> <christian.chiarcos at web.de> wrote:
>>
>> Dear all,
>>
>> earlier, we discussed categories for coloring the LLOD diagram. The
>> diagram we prepared for LDL-2013 was based on a something like the
>> minimal consensus:
>>
>> - lexicon (= LREMap lexicon, olac:lexicon)
>> - corpus (= LREMap corpus, ~ olac:primary data)
>> - language_description (basically everything else, ~
>> olac:language_description)
>>
>> I guess the first two are unproblematic, but the third is very
>> heterogeneous, it includes
>> - terminology repositories
>> - typological databases
>> - bibliographical databases
>> In a way, all of these "describe language" (information about languages,
>> information about concepts relevant to the description of langu
>>  age,
>>
>> information about collections of language data), but honestly, I would
>> prefer the label "other", because this is very different from what I
>> think an olac:language_description is meant to be.
>>
>> As far as I can see, a language description would be a (sketch) grammar  
>> or
>> a learner's manual or similar. I think we have none of those in the LLOD
>> cloud (though we might in the future). olac:language_description does  
>> not
>> seem to be a good choice there.
>>
>> I agree with Christian that there is not a lot of internal coherence in
>> group 3. What would be the reason against having 5 groups, rather than  
>> 3?
>> The typological databases group nicely, and I intend to add some more
>> typological databases over the next months.  Terminology repositories  
>> can
>> also be grouped. This only leaves Glottolog as the odd one out, and we  
>> can
>> call it "other".
>>
>> I suppose we will have to have some labels for
>>  groups
>> 3a and 3b, which
>> should be dereferenceable. Is there not something like xyz:tabulardata  
>> for
>> typological databases which we could subclass?
>>
>> Best
>> Sebastian
>>
>>
>> Two questions
>> - Is this general classification acceptable ?
>> - How shall we encode the categories ? Using tags "lexicon", "corpus",
>> etc. ? Or using a custom field "LLOD category" ? Unless anyone protests,
>> I would suggest to use tags for "lexicon" and "corpus" and classify
>> everything without such a tag as "language_description".
>>
>> Best,
>> Christian
>>
>>
>> open-linguistics mailing list
>> open-linguistics at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-linguistics
>> Unsubscribe: http://lists.okfn.org/mailman/options/open-linguistics
>>
>>
>> --
>> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail  
>> gesendet.
>> _______________________________________________
>> open-linguistics mailing list
>> open-linguistics at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-linguistics
>> Unsubscribe: http://lists.okfn.org/mailman/options/open-linguistics