[open-linguistics] Linguistic relevance

Thu Jan 22 11:51:52 UTC 2015

> I have struggled for quite some time with trying to understand how the
> LLOD cloud is supposed to be used. So I guess my proposal would be to
> select data sets for inclusion based on their relevance for the relevant
> use cases.
>

I think that's exactly the problem we need to cope with. Depending on one's
scientific background, these use cases and goals are very likely to
diverge. That's what makes finding a shared definition in an
interdisciplinary context quite a hard problem. I think a primary and
general use case for the diagram would be to facilitate external users to
locate and to assess the usability of LLOD data and metadata sets. Whereas
most of these are part of the LOD cloud as well (since Aug. 2014 even with
their own top-level category), our efforts are basically to provide
improved, linguistics-specific metadata about them, to visualize them in an
appealing way and to document strategies how to use this specific data and
the underlying technology within our community/-ies.

But if the diagram is meant to be a means of communication towards external
users, we basically need to define *their use cases*, not ours. In any
case, we should also collect use cases (we did in the LDL and MLODE
[post-]proceedings and other publications, but not on a central place in
the wiki). After coming up with a preliminary definition of linguisically
relevant, we should intensify our efforts in this direction, and -- if
necessary -- revise the definition of linguistic relevance afterwards.

E.g. if the LLOD cloud is supposed to be a curated, aggregated data set
> like bio2rdf, used by linguists to do research, only data sets which can be
> automatically integrated into it would qualify (and AFAIKT this would
> exclude quite a few data sets) - and inclusion could be handled on a case
> by case basis upon request. If on the other hand the LLOD cloud is a
> collection of metadata mainly used to generate the LLOD cloud image, other
> selection criteria may be more appropriate.
>

What do you mean by automatically integrated? Everything in the diagram
should have explicit links to other resources. If it doesn't, or it is not
accessible, it should not be in the diagram, but it can still have valid
metadata at datahub (or its successor).
The diagram itself is certainly not the primary motivation for working on
resource conversion, linking and metadata, but it is a concrete
manifestation of the available resources. Improving resource availability
and usability of linguistically relevant resources for linguistic research
*and* natural language processing are the actual goals of our efforts.
Along with -- of course -- promoting open resources as an even more general
goal.
In any case, the selection criteria depend on coming up with an operational
definition of linguistic relevance which permits both technical and
academic use cases, so, everyone is welcome to specify which resources s/he
wants to have included or excluded and to come up with a definition
proposal that captures this intuition.

Best,
Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20150122/d74a39db/attachment-0003.html>