[open-linguistics] LLOD diagram draft (feedback until Tue May 20th)
christian.chiarcos at web.de
Sat May 17 19:45:53 UTC 2014
please find the current draft for the LLOD cloud diagram attached.
- It is now *obligatory* to provide size information, using the field
"triples". This is because our fall-back strategy to use default size
- Novel datasets have been added, most prominently those of the LREC-2014
Share your Resources Initiative. On this extended dataset, 142 candidate
datasets are identified in datahub, but we know triples and links for only
63. For 6 more datasets (dbpedia-ko, dbpedia-cs, dbpedia-it, dbpedia-es,
wiktionary-dbpedia-org, dbpedia-de), we have links, but no triples.
- Datasets lacking "triples" or "links:..." (to or from other datasets)
information in the datahub entry are excluded from the diagram. The number
of datahub datasets has substantially grown, but we use this to enforce
stricter criteria rather than larger diagram.
- The SVG contains active hyperlinks into the datasets now.
- Bubble positioning follows the September-2013 diagram (the last official
one we released), not the April draft. The layout is not final yet.
- We reached a consensus to distinguish three sub-groups of
lexical-conceptual resources, i.e., (1) domain-specific vocabularies, (2)
general semantic knowledge bases, (3) lexical resources. For reasons of
time, the current diagram distinguishes two only (1+2)/(3), and (unless
someone volunteers to reclassify the 1+2 datasets until Tuesday) the
upcoming official diagram will preserve this classification.
Please check whether there are any issues in size, links and labels, and
if so, please update the datahub entries *and let me know* until Tuesday.
The official diagram should be ready by Thursday morning.
What is interesting as compared to the September 2013 diagram (for which
linking was not obligatory) is that we can see emerging nuclei (lexvo and
lexinfo in metadata, DBpedia in lexical/conceptual resources/general KBs).
Surprisingly, little convergence in the lexical/conceptual
resources/lexical resources sub-group, so far.
We will continue to apply increasingly rigid criteria on our data sets. At
the moment, the condition is that they are available as RDF over the web,
linked, and the URLs given in Datahub do respond. Subsequent editions will
enforce these constraints, e.g., by requiring a license conformant to the
Open Definition, by automatically checking and evaluating the RDF data
(instead of relying on metadata only) or by enforcing stricter criteria as
to what constitutes a "linguistically/NLP- relevant resource". We can
discuss over the mailing list and in the telcos how to priorize these (and
possible other) criteria.
All the best,
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany
office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 109066 bytes
Desc: not available
More information about the open-linguistics