[open-linguistics] Linguistic LOD cloud - help needed, now is the time to submit your data set

Christian Chiarcos christian.chiarcos at web.de
Thu Aug 9 04:14:53 UTC 2012


Hi Sebastian, dear all,

thank you very much for the initiative and for coordinating support for
RDF conversion of potential LLOD candidates.

Just for clarification: I wouldn't describe the result of your survey as
"shocking", but simply conformant to the (indeed!) *very loose*
requirement currently applied for inclusion in the LLOD diagram draft. The
requirement for the current draft was only that data providers *promise*
RDF conversion, open publication and linking, but not necessarily have
performed it yet. The idea was that interested colleagues from the group
may help with conversion and/or linking as soon as the bubble and its
potential linking has been announced (and in a few cases, this seems to be
underway -- I'm thinking of John and Judith here).

Actually, this is precisely why it is referred to as "draft", see the LREC
paper (http://www.lrec-conf.org/proceedings/lrec2012), Sect. 4.11. I think
everyone agrees that it would be great to shift from draft to official
status as soon as possible, and the MLODE workshop might bring us a leap
forward towards this goal, but at the moment, "draft" explicitly allows
the following resources to be included:

> - no RDF available
> - no links
> - no data online
> - too many bugs (e.g. Glottolog)

As for

> - no CKAN/datahub entry (e.g. ISOcat)

it would be great to have a CKAN moderator to take care of this. Until
then, it is basically the responsibility of the data provider to take care
of this, and ths might represent a (albeit small) obstacle. We discussed
that recently, wasn't there a volunteer ? At the moment, what we have
instead is a spreadsheet of candidates, and the task is mostly to transfer
and update this information. However, CKAN registration also requires to
provide contact information, and this means that (if deemed necessary) to
contact the authors whether they would agree to have their contact data
published, or to provide alternative contact information (for open
resources, this could be person who registers the resource).
For me, providing contact information about third parties has been the
reason to hesitate with the CKAN registration.

The situation is different with

> - only schematic information (e.g. GOLD)

As for the specific case of GOLD (and ISOcat, OLiA, lexvo and lingvo), it
certainly does not qualify as schema, but it formalizes domain-specific
terminology (that, incidentially, happens to be relevant to NLP tools,
although comparable repositories without NLP relevance are in existence,
e.g. http://www.ids-mannheim.de/gra/grammis.html, see the "Ontologie zur
deutschen Grammatik"). If the domain was not linguistic terminology, but
pizza recipes, it would count as independent resource, and so should these
resources.
     From these, only the OLiA Annotation Models may be considered to  
contain
schema information (as they describe annotations in a corpus or produced
by an NLP tool), but the OLiA Reference Model is certainly not a schema,
but a terminology repository, and so are GOLD and (even though
semi-structured only) ISOcat. To put it in other words, these resources
indeed formalize knowledge about a domain (linguistic terminology), which
is not restricted to its use in corpora, etc., although it can be (and is)
also applied for this purpose.

All the best,
Christian




More information about the open-linguistics mailing list