[open-linguistics] Linguistic LOD cloud - help needed, now is the time to submit your data set

Sebastian Hellmann hellmann at informatik.uni-leipzig.de
Thu Aug 9 06:52:37 UTC 2012


Dear Christian,
I was aware of the very loose initial requirements and I am also aware 
that the barrier to get 5 star linked data ( http://5stardata.info/ ) is 
high ( especially since there is actual work required ) .  This was ok 
as long as we assumed the "draft" status for the image. Now, I think the 
time is ripe to make a "real" one . What really changed now is that 
through MLODE we have some resources to provide help. So you might call 
my criteria 'hard', but not 'unfair' and I am the last one that wouldn't 
offer help (or try to organize help) in case anybody who promised and 
wants to join, gets left behind.

In particular, I created the image to remind the people that *have* the 
know-how and technical ability to really live what they pray and now and 
then do something small towards removing the "draft" on the cloud.

In reply to CKAN: To add a data set, you barely have to enter anything 
at first. Since it is a Wiki you can just start by almost nothing and 
then gradually you or other people can improve the page following a "pay 
as you go" or "release early, release often" methodology [1]. For 
example,  I created the page for OLiA just now: 
http://thedatahub.org/dataset/olia
Basically I needed to add:
1.  a name, i.e. "OLiA" and
2. a description -> "ontologies of linguistic annotation" and
3. I tagged it with "linguistics" for now.
4. Additionally, I added a "resource". In this case, it is a link to the 
HTML download page: http://olia.nlp2rdf.org
(I didn't check, but even adding a resource is not strictly required)

So creating CKAN entries is really not that hard and I am trying to find 
out what is holding people up?

All the best,
Sebastian

[1] http://en.wikipedia.org/wiki/Release_early,_release_often


Am 09.08.2012 06:14, schrieb Christian Chiarcos:
> Hi Sebastian, dear all,
>
> thank you very much for the initiative and for coordinating support for
> RDF conversion of potential LLOD candidates.
>
> Just for clarification: I wouldn't describe the result of your survey as
> "shocking", but simply conformant to the (indeed!) *very loose*
> requirement currently applied for inclusion in the LLOD diagram draft. 
> The
> requirement for the current draft was only that data providers *promise*
> RDF conversion, open publication and linking, but not necessarily have
> performed it yet. The idea was that interested colleagues from the group
> may help with conversion and/or linking as soon as the bubble and its
> potential linking has been announced (and in a few cases, this seems 
> to be
> underway -- I'm thinking of John and Judith here).
>
> Actually, this is precisely why it is referred to as "draft", see the 
> LREC
> paper (http://www.lrec-conf.org/proceedings/lrec2012), Sect. 4.11. I 
> think
> everyone agrees that it would be great to shift from draft to official
> status as soon as possible, and the MLODE workshop might bring us a leap
> forward towards this goal, but at the moment, "draft" explicitly allows
> the following resources to be included:
>
>> - no RDF available
>> - no links
>> - no data online
>> - too many bugs (e.g. Glottolog)
>
> As for
>
>> - no CKAN/datahub entry (e.g. ISOcat)
>
> it would be great to have a CKAN moderator to take care of this. Until
> then, it is basically the responsibility of the data provider to take 
> care
> of this, and ths might represent a (albeit small) obstacle. We discussed
> that recently, wasn't there a volunteer ? At the moment, what we have
> instead is a spreadsheet of candidates, and the task is mostly to 
> transfer
> and update this information. However, CKAN registration also requires to
> provide contact information, and this means that (if deemed necessary) to
> contact the authors whether they would agree to have their contact data
> published, or to provide alternative contact information (for open
> resources, this could be person who registers the resource).
> For me, providing contact information about third parties has been the
> reason to hesitate with the CKAN registration.
>
> The situation is different with
>
>> - only schematic information (e.g. GOLD)
>
> As for the specific case of GOLD (and ISOcat, OLiA, lexvo and lingvo), it
> certainly does not qualify as schema, but it formalizes domain-specific
> terminology (that, incidentially, happens to be relevant to NLP tools,
> although comparable repositories without NLP relevance are in existence,
> e.g. http://www.ids-mannheim.de/gra/grammis.html, see the "Ontologie zur
> deutschen Grammatik"). If the domain was not linguistic terminology, but
> pizza recipes, it would count as independent resource, and so should 
> these
> resources.
>     From these, only the OLiA Annotation Models may be considered to 
> contain
> schema information (as they describe annotations in a corpus or produced
> by an NLP tool), but the OLiA Reference Model is certainly not a schema,
> but a terminology repository, and so are GOLD and (even though
> semi-structured only) ISOcat. To put it in other words, these resources
> indeed formalize knowledge about a domain (linguistic terminology), which
> is not restricted to its use in corpora, etc., although it can be (and 
> is)
> also applied for this purpose.
>
> All the best,
> Christian
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
   * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, 2012)
   * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org





More information about the open-linguistics mailing list