[open-linguistics] Linguistic LOD cloud - help needed, now is the time to submit your data set

Fri Aug 3 10:39:38 UTC 2012

by the powers vested in me, I hereby grant the title of

OFFICIAL DOCUMENTER of the LINGUISTIC LINKED OPEN DATA CLOUD, ITS INNER  
WORKINGS and OUTER RELATIONS (ODLLOPDCIIWOR)

to

Dr JOHN MCCRAE

in line with the decision of the honourable telco committee of selecting  
dedicated scholars as title bearers for important roles
Yours truly
Sebastian N


On Fri, 03 Aug 2012 12:29:17 +0200, John McCrae  
<jmccrae at cit-ec.uni-bielefeld.de> wrote:

> Hi all,
>
> I did the analysis independently to try to figure out why Sebastian H,  
> had
> labelled so many resources as "fail". I found that most are actually not  
> in
> a terrible state but had a few issues (especially CKAN entries).
>
> One point though: I would say that resources such as GOLD should be
> included however... GOLD consists of nearly 600 identifiers and is far  
> from
> a trivial resource. Moreover, one of the key benefits of LLD is the  
> ability
> to agree on data categories by linking to web resources. I view this as a
> key "selling point" of LLD and would strongly campaign for keeping it ;)
>
> It does seem that from Sebastian N's comments there is a documentation
> issue here, we need clearer documentation of the procedure (as well I  
> have
> attended a lot of the telcos and if I can't figure it out, no-one outside
> will be able to). I would be happy to help with this... (but obviously I  
> am
> not totally clear on the procedures)
>
> I didn't use the Google Doc as it appears to be out-of-date (many of the
> resources in the diagram are not in the document)... should I integrate
> into the Google Doc, or perhaps we could move all documentation to the  
> Wiki
> so it is easier to find?
>
> Regards,
> John
>
> On Fri, Aug 3, 2012 at 11:03 AM, Sebastian Nordhoff <
> sebastian_nordhoff at eva.mpg.de> wrote:
>
>> Dear all,
>> there seems to be some confusion with regard to documentation practice.
>> Some members of this list are closer to the inner workings of the  
>> LOD-cloud
>> than others and are aware of many implicit assumptions/shared knowledge
>> other people ignore.
>> It would probably be good to list the relevant documents and processes
>> again. RTFM is OK, but you have to no where the M is.
>> Finally, I would like to commend John for bein BOLD in the wikipedia
>> sense. Not knowing the precise rules should not ban anyone from
>> contributing, and I would like to ask John to continue contributing with
>> whatever knowledge of the rules and procedures he has or lacks.
>> Best
>> Sebastian N
>>
>>
>>
>>
>>
>> On Fri, 03 Aug 2012 10:24:49 +0200, Sebastian Hellmann <
>> hellmann at informatik.uni-**leipzig.de  
>> <hellmann at informatik.uni-leipzig.de>>
>> wrote:
>>
>>  Hi John,
>>>
>>> Am 02.08.2012 15:19, schrieb John McCrae:
>>>
>>>> Hi all,
>>>>
>>>> I decided to do an independent evaluation of what was in the LLOD, to
>>>> identify what needs to be done, and found that the situation isn't
>>>> perhaps
>>>> as bad as the previous email suggests.
>>>>
>>> Sorry, John. The only thing you did is soften the criteria for
>>> inclusion. That doesn't make the data better. You even went so far as  
>>> to
>>> disregard the criteria superimposed by the current practice:
>>> http://richard.cyganiak.de/**2007/10/lod/#how-to-join<http://richard.cyganiak.de/2007/10/lod/#how-to-join>
>>> CKAN entry is required, if not then "fail".
>>>
>>>  My notes are here:
>>>>
>>>> http://wiki.okfn.org/Working_**Groups/linguistics/Resources_**
>>>> in_the_cloud<http://wiki.okfn.org/Working_Groups/linguistics/Resources_in_the_cloud>
>>>>
>>> Well, that is a nice table, but rather pointless. Please concentrate on
>>> maintaining the group resources at:
>>> http://thedatahub.org/en/**group/linguistics<http://thedatahub.org/en/group/linguistics>
>>> or
>>> https://docs.google.com/**spreadsheet/ccc?key=**
>>> 0AlMk5ouIspH1dGx1R1Rnd1ZXX0xmL**XppSWFrcm0wNFE&authkey=**
>>> CJi9u78D&authkey=CJi9u78D#gid=**0<https://docs.google.com/spreadsheet/ccc?key=0AlMk5ouIspH1dGx1R1Rnd1ZXX0xmLXppSWFrcm0wNFE&authkey=CJi9u78D&authkey=CJi9u78D#gid=0>
>>>
>>>
>>>> The following resources appeared to be acceptable (i.e., they exist,  
>>>> have
>>>> RDF, contain some useful data and had links to some other resource or  
>>>> to
>>>> data categories)
>>>>
>>> softening criteria
>>>
>>>>
>>>>     - Cornetto
>>>>     - WOLD
>>>>     - W3C WordNet
>>>>     - DBPediaWiktionary
>>>>     - LemonWiktionary*
>>>>     - LemonWordNet*
>>>>     - Open Data Thesaurus**
>>>>     - DBPedia**
>>>>     - YAGO
>>>>     - Localized DBPedias**
>>>>     - OpenCyc
>>>>     - GOLD***
>>>>     - ISOcat***
>>>>     - Lexvo
>>>>     - Lingvoj
>>>>     - Glottolog/LingDoc*
>>>>
>>>> * Sebastian has indicated that these resources may be buggy. There  
>>>> are no
>>>> issues here  
>>>> <http://code.google.com/p/**mlode/issues/list<http://code.google.com/p/mlode/issues/list>>
>>>> that make them
>>>> unusable however so I count them as good.
>>>>
>>> LemonWiktionary and Glottolog have 18 issues total, which is good.
>>> Sebastian Nordhoff already fixed 4 bugs for Glottolog, making it much
>>> better and removing the "fail".
>>> Let's work on the data, not lowering expectations.
>>>
>>>> ** DBpedia and Open Data Thesaurus are not primarily linguistics
>>>> resources,
>>>> should they be included in the LLOD cloud?
>>>>
>>> My definition would include "anything that is useful for NLP" as well.
>>> Besides you have redirects.
>>>
>>>> *** IMHO categories and schematic information resources are vital  
>>>> part of
>>>> the LLOD cloud, I can't understand why Sebastian suggests they should  
>>>> not
>>>> be included!?
>>>>
>>> copying behaviour from http://lod-cloud.net/
>>> We can do schemas extra, if you want to.
>>>
>>>> The following resources need to be entered into CKAN: (6/27)
>>>> <snip>
>>>>
>>>> The following resources should be removed (at least for the time  
>>>> being)
>>>> from the cloud diagram: (5/27)
>>>> <snip>
>>>>
>>>> The following resources need attention: (4/27)
>>>> <snip>
>>>>
>>> That is a total of 15, I counted 18.
>>>
>>>  So In summary out of the 27 bubbles in the LLOD cloud 17 are usable  
>>> and 4
>>>> can likely be quickly fixed. I have attached a version of the LLOD  
>>>> cloud
>>>> with these results attached. Please edit the Wiki page if you feel I  
>>>> have
>>>> got something wrong.
>>>>
>>> Please concentrate on editing CKAN  or the Google spreadsheet and  
>>> submit
>>> your data set to Google code
>>> We are working on creating updates of the cloud based on CKAN.
>>> @John, please read:
>>> http://richard.cyganiak.de/**2007/10/lod/#how-to-join<http://richard.cyganiak.de/2007/10/lod/#how-to-join>
>>> http://wiki.okfn.org/Wg/**linguistics/llod#How_to_**contribute<http://wiki.okfn.org/Wg/linguistics/llod#How_to_contribute>
>>> LemonWordnet for example needs 50 links to an existing resource. Jimmy
>>> O'Regan was so kind to create that for you:
>>> http://code.google.com/p/**mlode/issues/detail?id=34<http://code.google.com/p/mlode/issues/detail?id=34>
>>>
>>> Kind regards,
>>> Sebastian
>>>
>>>
>> ______________________________**_________________
>> open-linguistics mailing list
>> open-linguistics at lists.okfn.**org <open-linguistics at lists.okfn.org>
>> http://lists.okfn.org/mailman/**listinfo/open-linguistics<http://lists.okfn.org/mailman/listinfo/open-linguistics>