[open-linguistics] Linguistic relevance

Robert Forkel xrotwang at googlemail.com
Thu Jan 22 19:53:51 UTC 2015


I understand that use cases will be very different, but still think they
should be the drivers of what gets included in the LLOD - i.e. if no one
has ever successfully used a dataset for anything, it shouldn't be
included. At least from my perspective this would help prevent the
frustration when datasets in the diagram turn out to be unusable.

I'll give an example: i spotted the ietflang dataset (
http://datahub.io/dataset/ietflang) in the diagram and was interested in
using it since mapping between different kinds of language identifiers is
something I have to do often. I couldn't figure out a way to get the whole
dataset - and reading a 2 year old unanswered comment asking this very
question didn't inspire much confidence.

I also think some clear exemplary use cases would help data creators decide
whether it would be useful for their datasets to be included, i.e. these
use cases could make data creators *want* to have the datasets included.

best
robert


On Thu, Jan 22, 2015 at 12:51 PM, Christian Chiarcos <
christian.chiarcos at web.de> wrote:

>
> I have struggled for quite some time with trying to understand how the
>> LLOD cloud is supposed to be used. So I guess my proposal would be to
>> select data sets for inclusion based on their relevance for the relevant
>> use cases.
>>
>
> I think that's exactly the problem we need to cope with. Depending on
> one's scientific background, these use cases and goals are very likely to
> diverge. That's what makes finding a shared definition in an
> interdisciplinary context quite a hard problem. I think a primary and
> general use case for the diagram would be to facilitate external users to
> locate and to assess the usability of LLOD data and metadata sets. Whereas
> most of these are part of the LOD cloud as well (since Aug. 2014 even with
> their own top-level category), our efforts are basically to provide
> improved, linguistics-specific metadata about them, to visualize them in an
> appealing way and to document strategies how to use this specific data and
> the underlying technology within our community/-ies.
>
> But if the diagram is meant to be a means of communication towards
> external users, we basically need to define *their use cases*, not ours. In
> any case, we should also collect use cases (we did in the LDL and MLODE
> [post-]proceedings and other publications, but not on a central place in
> the wiki). After coming up with a preliminary definition of linguisically
> relevant, we should intensify our efforts in this direction, and -- if
> necessary -- revise the definition of linguistic relevance afterwards.
>
> E.g. if the LLOD cloud is supposed to be a curated, aggregated data set
>> like bio2rdf, used by linguists to do research, only data sets which can be
>> automatically integrated into it would qualify (and AFAIKT this would
>> exclude quite a few data sets) - and inclusion could be handled on a case
>> by case basis upon request. If on the other hand the LLOD cloud is a
>> collection of metadata mainly used to generate the LLOD cloud image, other
>> selection criteria may be more appropriate.
>>
>
> What do you mean by automatically integrated? Everything in the diagram
> should have explicit links to other resources. If it doesn't, or it is not
> accessible, it should not be in the diagram, but it can still have valid
> metadata at datahub (or its successor).
> The diagram itself is certainly not the primary motivation for working on
> resource conversion, linking and metadata, but it is a concrete
> manifestation of the available resources. Improving resource availability
> and usability of linguistically relevant resources for linguistic research
> *and* natural language processing are the actual goals of our efforts.
> Along with -- of course -- promoting open resources as an even more general
> goal.
> In any case, the selection criteria depend on coming up with an
> operational definition of linguistic relevance which permits both technical
> and academic use cases, so, everyone is welcome to specify which resources
> s/he wants to have included or excluded and to come up with a definition
> proposal that captures this intuition.
>
> Best,
> Christian
>
> _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-linguistics
> Unsubscribe: https://lists.okfn.org/mailman/options/open-linguistics
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20150122/14b1eb34/attachment-0003.html>


More information about the open-linguistics mailing list