[open-linguistics] LLOD diagram draft (feedback until Tue May 20th)

Sat May 17 19:45:53 UTC 2014

Dear all,

please find the current draft for the LLOD cloud diagram attached.

Important features:

- It is now *obligatory* to provide size information, using the field  
"triples". This is because our fall-back strategy to use default size  
caused irritations.
- Novel datasets have been added, most prominently those of the LREC-2014  
Share your Resources Initiative. On this extended dataset, 142 candidate  
datasets are identified in datahub, but we know triples and links for only  
63. For 6 more datasets (dbpedia-ko, dbpedia-cs, dbpedia-it, dbpedia-es,  
wiktionary-dbpedia-org, dbpedia-de), we have links, but no triples.
- Datasets lacking "triples" or "links:..." (to or from other datasets)  
information in the datahub entry are excluded from the diagram. The number  
of datahub datasets has substantially grown, but we use this to enforce  
stricter criteria rather than larger diagram.
- The SVG contains active hyperlinks into the datasets now.
- Bubble positioning follows the September-2013 diagram (the last official  
one we released), not the April draft. The layout is not final yet.
- We reached a consensus to distinguish three sub-groups of  
lexical-conceptual resources, i.e., (1) domain-specific vocabularies, (2)  
general semantic knowledge bases, (3) lexical resources. For reasons of  
time, the current diagram distinguishes two only (1+2)/(3), and (unless  
someone volunteers to reclassify the 1+2 datasets until Tuesday) the  
upcoming official diagram will preserve this classification.

Please check whether there are any issues in size, links and labels, and  
if so, please update the datahub entries *and let me know* until Tuesday.  
The official diagram should be ready by Thursday morning.

What is interesting as compared to the September 2013 diagram (for which  
linking was not obligatory) is that we can see emerging nuclei (lexvo and  
lexinfo in metadata, DBpedia in lexical/conceptual resources/general KBs).  
Surprisingly, little convergence in the lexical/conceptual  
resources/lexical resources sub-group, so far.

We will continue to apply increasingly rigid criteria on our data sets. At  
the moment, the condition is that they are available as RDF over the web,  
linked, and the URLs given in Datahub do respond. Subsequent editions will  
enforce these constraints, e.g., by requiring a license conformant to the  
Open Definition, by automatically checking and evaluating the RDF data  
(instead of relying on metadata only) or by enforcing stricter criteria as  
to what constitutes a "linguistically/NLP- relevant resource". We can  
discuss over the mailing list and in the telcos how to priorize these (and  
possible other) criteria.

All the best,
Christian
-- 
Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany

office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llod-cloud.2014-05-17.draft.svg
Type: image/svg+xml
Size: 109066 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20140517/b6c062d3/attachment-0002.svg>