[open-linguistics] Collection of resources

Sebastian Nordhoff sebastian_nordhoff at eva.mpg.de
Sun Jan 16 22:10:21 UTC 2011


On Fri, 14 Jan 2011 16:11:46 +0100, Sebastian Hellmann  
<hellmann at informatik.uni-leipzig.de> wrote:

> Dear all,
> please review and amend and spread this:
> 1. Christian made a start and gave a list of data sets. We are trying to
> collect possible candidates for LOD - CKAN on:
> https://spreadsheets.google.com/ccc?key=0AlMk5ouIspH1dGx1R1Rnd1ZXX0xmLXppSWFrcm0wNFE&hl=en&authkey=CJi9u78D
>
> 2. We drew a Linking Open Data Cloud draft, it is just a vision and
> might still be incorrect (see bottom).
> There are 4 main types: Dictionary, Lexical Semantical Resources,
> Corpora and Schema/Ontologies

Dear all,
I added some resources to the spreadsheet which deal with Lesser Known  
Languages, i.e. those which are not of particular interest to machine  
translation and the like (probably beyond 95% of the world's languages).  
Obviously, the documentary status of most of these languages is pretty  
bad, and we are lightyears away from an annotated corpus for most of them.
Still, the information which is available about them could be Open Data.  
This concerns structural information, like phoneme inventories, but also  
non-structural information, like number of speakers, regions where the  
country is spoken, and bibliographical resources available.
I will finally not be able to make it on Tuesday, unfortunately, but I  
hope that the definition of Open Data in Linguistics will be sufficiently  
large to a) not only include morphosyntax and the lexicon as interesting  
domains and b) make sure that the world's linguistic diversity can  
adequately be represented.

Best wishes
Sebastian







> There are already some datasets on: http://ckan.net/tag/linguistics and
> http://ckan.net/tag/linguistic <http://ckan.net/tag/linguistics>
> On Tuesday, we will have to fix: http://ckan.net/group/linguistics . It
> has only two data sets, which are badly described.
>
> 3. I also made some slides, which I presented today at ASV Leipzig in
> front of Heyer and Quasthoff:
> http://www.slideshare.net/kurzum/nlp2rdf
>
> Hope to see you on Monday,
> Sebastian H
>


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/




More information about the open-linguistics mailing list