[open-linguistics] Collection of resources
Sebastian Nordhoff
sebastian_nordhoff at eva.mpg.de
Sun Jan 16 22:10:21 UTC 2011
On Fri, 14 Jan 2011 16:11:46 +0100, Sebastian Hellmann
<hellmann at informatik.uni-leipzig.de> wrote:
> Dear all,
> please review and amend and spread this:
> 1. Christian made a start and gave a list of data sets. We are trying to
> collect possible candidates for LOD - CKAN on:
> https://spreadsheets.google.com/ccc?key=0AlMk5ouIspH1dGx1R1Rnd1ZXX0xmLXppSWFrcm0wNFE&hl=en&authkey=CJi9u78D
>
> 2. We drew a Linking Open Data Cloud draft, it is just a vision and
> might still be incorrect (see bottom).
> There are 4 main types: Dictionary, Lexical Semantical Resources,
> Corpora and Schema/Ontologies
Dear all,
I added some resources to the spreadsheet which deal with Lesser Known
Languages, i.e. those which are not of particular interest to machine
translation and the like (probably beyond 95% of the world's languages).
Obviously, the documentary status of most of these languages is pretty
bad, and we are lightyears away from an annotated corpus for most of them.
Still, the information which is available about them could be Open Data.
This concerns structural information, like phoneme inventories, but also
non-structural information, like number of speakers, regions where the
country is spoken, and bibliographical resources available.
I will finally not be able to make it on Tuesday, unfortunately, but I
hope that the definition of Open Data in Linguistics will be sufficiently
large to a) not only include morphosyntax and the lexicon as interesting
domains and b) make sure that the world's linguistic diversity can
adequately be represented.
Best wishes
Sebastian
> There are already some datasets on: http://ckan.net/tag/linguistics and
> http://ckan.net/tag/linguistic <http://ckan.net/tag/linguistics>
> On Tuesday, we will have to fix: http://ckan.net/group/linguistics . It
> has only two data sets, which are badly described.
>
> 3. I also made some slides, which I presented today at ASV Leipzig in
> front of Heyer and Quasthoff:
> http://www.slideshare.net/kurzum/nlp2rdf
>
> Hope to see you on Monday,
> Sebastian H
>
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
More information about the open-linguistics
mailing list