[open-linguistics] LLOD diagram draft

Philipp Cimiano cimiano at cit-ec.uni-bielefeld.de
Wed Apr 9 12:24:31 UTC 2014


Dear all,

  apologies, but my connection here is very bad, so I can not follow the 
skype telco, so I provide my input here answering to the email of Christian.

I like the top categories: corpus, lexicon metadata in principle. But I 
would recommend to reuse categories proposed by others. For example, the 
Metashare node of UPF uses the following categories (thanks to Jorge for 
providing them):

  *

    Lexical Conceptual Resource (94)

      o

        Lexicon (77)

      o

        Wordnet (6)

      o

        Terminological Resource (4)

      o

        Word List (4)

      o

        Ontology (3)

  *

    Corpus (30)

  * Tool Service (10)


I think reusing these categories (except for Tool Service) would be 
fine. The numbers in brackets indicate the number of resoruces of the 
corresponding type available. Adding Metadata would be good.

ParallelCorpus as subcategory of Corpus seems appropriate and useful aas 
just suggested in the telco (I picked that ;))

Other than that, the subcategories of Corpus would be defined by the 
annotation layers the corpus contains, getting too fine-graned at the 
level of the cloud is difficult.

In any case in the future I hope that we can dynamically generate 
different diagrams filtering by conditions, e.g. license, annotation 
layers available, language etc.




Am 04.04.14 21:44, schrieb Christian Chiarcos:
> Dear all,
>
> please find the first draft for the new LLOD cloud diagram attached.
>
> An important difference as compared to the last draft is that *only 
> datasets with links to other LLOD datasets are included*. Data sets 
> for which we could not read information from any of the URLs given in 
> Datahub responded were excluded.
>
> If you don't find your dataset displayed properly (or missing), please 
> check your Datahub entry!
>
> Differences as compared to last edition:
> - Categories revised, now at two levels of granularity (feedback please!)
> - Novel data sets, including the datasets of LDL-2014 contributions 
> and the associated data challenge
> - Included linguistically relevant Datahub entries *not* marked as 
> ressources of the linguistics group (e.g., the Greek WordNet). We 
> extracted all Datahub entries with tags "llod", "linguistics%20lod", 
> "lexicon", "corpus", "thesaurus", "linguistic", "linguistics", or 
> "typology".
> - Diagram pruning: Eliminate data sets not linked with other LLOD data 
> sets
>
> Known issues:
> - Edge breadth and bubble size reflect the link/triple counts as given 
> in Datahub. Where this information is not found, edges are missing or 
> bubbles are equally sized.
> - Datasets from the LREC Share Your Resources Initiative have not been 
> included yet. We can discuss at the telco next week whether we want to 
> prepare a May-2014 edition that covers this (and other) data.
>
> All the best,
> Christian
>
>
> _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-linguistics
> Unsubscribe: https://lists.okfn.org/mailman/options/open-linguistics


-- 

Prof. Dr. Philipp Cimiano

Phone: +49 521 106 12249
Fax: +49 521 106 12412
Mail: cimiano at cit-ec.uni-bielefeld.de

Forschungsbau Intelligente Systeme (FBIIS)
Raum 2.307
Universität Bielefeld
Inspiration 1
33619 Bielefeld

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20140409/cfb283ac/attachment-0003.html>


More information about the open-linguistics mailing list