[open-linguistics] simplify procedure for submitting your data set

Sebastian Hellmann hellmann at informatik.uni-leipzig.de
Fri Aug 3 16:38:35 UTC 2012


Hi John,

Am 03.08.2012 12:29, schrieb John McCrae:
> It does seem that from Sebastian N's comments there is a documentation
> issue here, we need clearer documentation of the procedure (as well I have
> attended a lot of the telcos and if I can't figure it out, no-one outside
> will be able to). I would be happy to help with this... (but obviously I am
> not totally clear on the procedures)
The procedure was made up rather ad-hoc. It works well for developers 
and IT guys, who are used to issue tracker and data wikis.
You showed, however, that we are running the risk of loosing a lot of 
people and data (even the tech-savvy ones )
It would be good to simplify the submission. Maybe just an email to this 
list, or to you and Martin Brümmer <bruemmer at informatik.uni-leipzig.de> .
Then we can set up CKAN and Issue in MLODE and take it from there.
What do you think? Could also be on the WikiPage you created, or a 
Google Form:
https://docs.google.com/spreadsheet/gform?key=0AlMk5ouIspH1dENfTnNUV0VuLVNnZVVGMEpTZ1ZrMnc&hl=en_US
https://docs.google.com/spreadsheet/viewform?formkey=dENfTnNUV0VuLVNnZVVGMEpTZ1ZrMnc6MQ

Sebastian


>
> I didn't use the Google Doc as it appears to be out-of-date (many of the
> resources in the diagram are not in the document)... should I integrate
> into the Google Doc, or perhaps we could move all documentation to the Wiki
> so it is easier to find?
>
> Regards,
> John
>
> On Fri, Aug 3, 2012 at 11:03 AM, Sebastian Nordhoff <
> sebastian_nordhoff at eva.mpg.de> wrote:
>
>> Dear all,
>> there seems to be some confusion with regard to documentation practice.
>> Some members of this list are closer to the inner workings of the LOD-cloud
>> than others and are aware of many implicit assumptions/shared knowledge
>> other people ignore.
>> It would probably be good to list the relevant documents and processes
>> again. RTFM is OK, but you have to no where the M is.
>> Finally, I would like to commend John for bein BOLD in the wikipedia
>> sense. Not knowing the precise rules should not ban anyone from
>> contributing, and I would like to ask John to continue contributing with
>> whatever knowledge of the rules and procedures he has or lacks.
>> Best
>> Sebastian N
>>
>>
>>
>>
>>
>> On Fri, 03 Aug 2012 10:24:49 +0200, Sebastian Hellmann <
>> hellmann at informatik.uni-**leipzig.de <hellmann at informatik.uni-leipzig.de>>
>> wrote:
>>
>>   Hi John,
>>> Am 02.08.2012 15:19, schrieb John McCrae:
>>>
>>>> Hi all,
>>>>
>>>> I decided to do an independent evaluation of what was in the LLOD, to
>>>> identify what needs to be done, and found that the situation isn't
>>>> perhaps
>>>> as bad as the previous email suggests.
>>>>
>>> Sorry, John. The only thing you did is soften the criteria for
>>> inclusion. That doesn't make the data better. You even went so far as to
>>> disregard the criteria superimposed by the current practice:
>>> http://richard.cyganiak.de/**2007/10/lod/#how-to-join<http://richard.cyganiak.de/2007/10/lod/#how-to-join>
>>> CKAN entry is required, if not then "fail".
>>>
>>>   My notes are here:
>>>> http://wiki.okfn.org/Working_**Groups/linguistics/Resources_**
>>>> in_the_cloud<http://wiki.okfn.org/Working_Groups/linguistics/Resources_in_the_cloud>
>>>>
>>> Well, that is a nice table, but rather pointless. Please concentrate on
>>> maintaining the group resources at:
>>> http://thedatahub.org/en/**group/linguistics<http://thedatahub.org/en/group/linguistics>
>>> or
>>> https://docs.google.com/**spreadsheet/ccc?key=**
>>> 0AlMk5ouIspH1dGx1R1Rnd1ZXX0xmL**XppSWFrcm0wNFE&authkey=**
>>> CJi9u78D&authkey=CJi9u78D#gid=**0<https://docs.google.com/spreadsheet/ccc?key=0AlMk5ouIspH1dGx1R1Rnd1ZXX0xmLXppSWFrcm0wNFE&authkey=CJi9u78D&authkey=CJi9u78D#gid=0>
>>>
>>>
>>>> The following resources appeared to be acceptable (i.e., they exist, have
>>>> RDF, contain some useful data and had links to some other resource or to
>>>> data categories)
>>>>
>>> softening criteria
>>>
>>>>      - Cornetto
>>>>      - WOLD
>>>>      - W3C WordNet
>>>>      - DBPediaWiktionary
>>>>      - LemonWiktionary*
>>>>      - LemonWordNet*
>>>>      - Open Data Thesaurus**
>>>>      - DBPedia**
>>>>      - YAGO
>>>>      - Localized DBPedias**
>>>>      - OpenCyc
>>>>      - GOLD***
>>>>      - ISOcat***
>>>>      - Lexvo
>>>>      - Lingvoj
>>>>      - Glottolog/LingDoc*
>>>>
>>>> * Sebastian has indicated that these resources may be buggy. There are no
>>>> issues here <http://code.google.com/p/**mlode/issues/list<http://code.google.com/p/mlode/issues/list>>
>>>> that make them
>>>> unusable however so I count them as good.
>>>>
>>> LemonWiktionary and Glottolog have 18 issues total, which is good.
>>> Sebastian Nordhoff already fixed 4 bugs for Glottolog, making it much
>>> better and removing the "fail".
>>> Let's work on the data, not lowering expectations.
>>>
>>>> ** DBpedia and Open Data Thesaurus are not primarily linguistics
>>>> resources,
>>>> should they be included in the LLOD cloud?
>>>>
>>> My definition would include "anything that is useful for NLP" as well.
>>> Besides you have redirects.
>>>
>>>> *** IMHO categories and schematic information resources are vital part of
>>>> the LLOD cloud, I can't understand why Sebastian suggests they should not
>>>> be included!?
>>>>
>>> copying behaviour from http://lod-cloud.net/
>>> We can do schemas extra, if you want to.
>>>
>>>> The following resources need to be entered into CKAN: (6/27)
>>>> <snip>
>>>>
>>>> The following resources should be removed (at least for the time being)
>>>> from the cloud diagram: (5/27)
>>>> <snip>
>>>>
>>>> The following resources need attention: (4/27)
>>>> <snip>
>>>>
>>> That is a total of 15, I counted 18.
>>>
>>>   So In summary out of the 27 bubbles in the LLOD cloud 17 are usable and 4
>>>> can likely be quickly fixed. I have attached a version of the LLOD cloud
>>>> with these results attached. Please edit the Wiki page if you feel I have
>>>> got something wrong.
>>>>
>>> Please concentrate on editing CKAN  or the Google spreadsheet and submit
>>> your data set to Google code
>>> We are working on creating updates of the cloud based on CKAN.
>>> @John, please read:
>>> http://richard.cyganiak.de/**2007/10/lod/#how-to-join<http://richard.cyganiak.de/2007/10/lod/#how-to-join>
>>> http://wiki.okfn.org/Wg/**linguistics/llod#How_to_**contribute<http://wiki.okfn.org/Wg/linguistics/llod#How_to_contribute>
>>> LemonWordnet for example needs 50 links to an existing resource. Jimmy
>>> O'Regan was so kind to create that for you:
>>> http://code.google.com/p/**mlode/issues/detail?id=34<http://code.google.com/p/mlode/issues/detail?id=34>
>>>
>>> Kind regards,
>>> Sebastian
>>>
>>>
>> ______________________________**_________________
>> open-linguistics mailing list
>> open-linguistics at lists.okfn.**org <open-linguistics at lists.okfn.org>
>> http://lists.okfn.org/mailman/**listinfo/open-linguistics<http://lists.okfn.org/mailman/listinfo/open-linguistics>
>>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
   * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, 2012)
   * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20120803/4e0fdaab/attachment-0001.html>


More information about the open-linguistics mailing list