[open-linguistics] Linguistic relevance

Fri Jan 23 12:13:24 UTC 2015

On Thu, Jan 22, 2015 at 8:53 PM, Robert Forkel <xrotwang at googlemail.com>
wrote:

> I understand that use cases will be very different, but still think they
> should be the drivers of what gets included in the LLOD - i.e. if no one
> has ever successfully used a dataset for anything, it shouldn't be
> included. At least from my perspective this would help prevent the
> frustration when datasets in the diagram turn out to be unusable.
>
> I'll give an example: i spotted the ietflang dataset (
> http://datahub.io/dataset/ietflang) in the diagram and was interested in
> using it since mapping between different kinds of language identifiers is
> something I have to do often. I couldn't figure out a way to get the whole
> dataset - and reading a 2 year old unanswered comment asking this very
> question didn't inspire much confidence.
>
Yeah it seems that Datahub comments are quite useless and it seems don't
notify the resource owner, however you could just ask the original author,
or even post to a public mailing list ;)

The 'dataset' is actually a service and trying to generate all the data
this service could return would be impractical (the service recognizes 9.5
billion URIs), however the source for the service is here:

https://github.com/jmccrae/rfc4646mapper

I hope it might yet prove useful to you

Regards,
John

>
> I also think some clear exemplary use cases would help data creators
> decide whether it would be useful for their datasets to be included, i.e.
> these use cases could make data creators *want* to have the datasets
> included.
>
> best
> robert
>
>
> On Thu, Jan 22, 2015 at 12:51 PM, Christian Chiarcos <
> christian.chiarcos at web.de> wrote:
>
>>
>> I have struggled for quite some time with trying to understand how the
>>> LLOD cloud is supposed to be used. So I guess my proposal would be to
>>> select data sets for inclusion based on their relevance for the relevant
>>> use cases.
>>>
>>
>> I think that's exactly the problem we need to cope with. Depending on
>> one's scientific background, these use cases and goals are very likely to
>> diverge. That's what makes finding a shared definition in an
>> interdisciplinary context quite a hard problem. I think a primary and
>> general use case for the diagram would be to facilitate external users to
>> locate and to assess the usability of LLOD data and metadata sets. Whereas
>> most of these are part of the LOD cloud as well (since Aug. 2014 even with
>> their own top-level category), our efforts are basically to provide
>> improved, linguistics-specific metadata about them, to visualize them in an
>> appealing way and to document strategies how to use this specific data and
>> the underlying technology within our community/-ies.
>>
>> But if the diagram is meant to be a means of communication towards
>> external users, we basically need to define *their use cases*, not ours. In
>> any case, we should also collect use cases (we did in the LDL and MLODE
>> [post-]proceedings and other publications, but not on a central place in
>> the wiki). After coming up with a preliminary definition of linguisically
>> relevant, we should intensify our efforts in this direction, and -- if
>> necessary -- revise the definition of linguistic relevance afterwards.
>>
>> E.g. if the LLOD cloud is supposed to be a curated, aggregated data set
>>> like bio2rdf, used by linguists to do research, only data sets which can be
>>> automatically integrated into it would qualify (and AFAIKT this would
>>> exclude quite a few data sets) - and inclusion could be handled on a case
>>> by case basis upon request. If on the other hand the LLOD cloud is a
>>> collection of metadata mainly used to generate the LLOD cloud image, other
>>> selection criteria may be more appropriate.
>>>
>>
>> What do you mean by automatically integrated? Everything in the diagram
>> should have explicit links to other resources. If it doesn't, or it is not
>> accessible, it should not be in the diagram, but it can still have valid
>> metadata at datahub (or its successor).
>> The diagram itself is certainly not the primary motivation for working on
>> resource conversion, linking and metadata, but it is a concrete
>> manifestation of the available resources. Improving resource availability
>> and usability of linguistically relevant resources for linguistic research
>> *and* natural language processing are the actual goals of our efforts.
>> Along with -- of course -- promoting open resources as an even more general
>> goal.
>> In any case, the selection criteria depend on coming up with an
>> operational definition of linguistic relevance which permits both technical
>> and academic use cases, so, everyone is welcome to specify which resources
>> s/he wants to have included or excluded and to come up with a definition
>> proposal that captures this intuition.
>>
>> Best,
>> Christian
>>
>> _______________________________________________
>> open-linguistics mailing list
>> open-linguistics at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/open-linguistics
>> Unsubscribe: https://lists.okfn.org/mailman/options/open-linguistics
>>
>>
>
> _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-linguistics
> Unsubscribe: https://lists.okfn.org/mailman/options/open-linguistics
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20150123/85f2932d/attachment-0003.html>