[open-linguistics] Criteria for Inclusion in LLOD

Thu Aug 9 04:17:53 UTC 2012

I absolutely support the proposal to discuss the criteria, and especially
how to manage and plan the long-anticipated gradual transition from the
current, very loose criteria of the LLOD *draft* to the more rigid ones of
the final diagram.

The basis for discussing draft criteria should be the criteria that were
currently applied:
- http://wiki.okfn.org/Wg/linguistics/llod#How_to_contribute

The final criteria should certainly not be looser than those of the LOD,
plus the constraint that it has to be a linguistic resource (whatever
"linguistic resource" means in this context).

In the LREC paper, we announced a transition from draft to official status
within the next two years, but this estimate may have been pessimistic
given the current efforts ;)

Probably the least controversial transition criterion would be to come up
with final LLOD criteria and then to apply a threshold for the number of
resources and links in the cloud draft that are conformant with these
criteria, say 30 relatively bug-free resources each linked with at least 2
other bubbles. For motivation, we can maintain a progress bar in the wiki,
e.g., on the basis of John's page.

> Open issues are:
> 1. Does anything speak against adopting:  
> http://richard.cyganiak.de/2007/10/lod/#how-to-join . The 50 link  
> threshold is quite arbitrary.
> 2. Do we require the data to be "open"?

We should discuss technical criteria and licensing issues separately.

> 3. Shall we include schema? E.g. DBpedia Ontology, GOLD, POWLA, etc.

As written before, GOLD is not actually a schema. From those resources
currently in the cloud, the only clear example of a schema is POWLA (and,
probably, lemon source). I would not insist to have POWLA included, as
long as POWLA corpora are present.

> 4. a) What counts as Linguistic data set and what not?
> 4. b) Should we include any other data sets from  
> http://richard.cyganiak.de/2007/10/lod/imagemap.html

Personally, I always understood the LLOD effort to be about creating an
LOD (sub-)cloud of linguistic resources,* so I do not see any reason not
to include these if they are linguistically relevant. (Albeit this needs
to be defined.)

Christian

* Although the LLOD can be regarded a part of the LOD cloud, a separate
diagram is advisable. First, it is easier for people interested in
linguistic resources to actually find them here rather than in the LOD
diagram. But also, when we talk about an LOD-subcloud, having a generic
converter from corpora to RDF could eventually mean that *any* annotated
corpus can be added to the cloud, and the LOD creators may find themselves
swamped by RDF corpora and change their inclusion criteria to stick to
traditional knowledge bases. Therefore, we have a good reason to develop a
separate diagram.