[okfn-discuss] Suggestions for datasets that you'd like to see available as linked data

Christian Chiarcos christian.chiarcos at web.de
Wed May 23 03:39:05 UTC 2012


Hi Darwin,

one data set may be the Europarl corpus, i.e., a linguistically
annotated version of the EU documents, distributed under an open
license, with word- and sentence-level alignment and 1 mio tokens for
most EU languages (original corpous: http://www.statmt.org/europarl/,
annotated version: http://opus.lingfil.uu.se/).

The difference to ParlTrac data are the linguistic annotations (so,
it's really a huge chunk of structured data, not just plain text plus
meta data), that the Europarl data sets are heavily used in
linguistics and NLP, e.g., in machine translation, and that they
actually represent a "classical" resource in a number of ways. Having
these as Linked Data would promote Linked Data in linguistics and NLP
substantially. I have already worked out most aspects of converting
the English-French language pair plus its annotations to RDF, if
you're interested in cooperating on these efforts, applying them to
other languages, and hosting the data afterwards, please let me know.

Best,
Christian

2012/5/23 stef <s at ctrlc.hu>:
> hey,
>
> On Fri, May 18, 2012 at 02:03:04PM +0100, Darwin Peltan wrote:
>> As part of the LOD2 project[1] we are looking to transform several datasets
>> to RDF.
>>
>> Please let me know if you have any suggestions for particular datasets
>> (that are already open) that you would like to see available as RDF. These
>> datasets should be from within the EU region.
>
> i don't know if i want to seem the suggested data below as linked data, but
> i'm certainly interested in experimenting with it. ;)
>
> you might want to have a look at the parltrack European parliament data:
> http://parltrack.euwiki.org/dumps - schema is here:
> http://parltrack.euwiki.org/dumps/schema.html
> also there's a json api for all objects.
>
> also you might want to have a look at some other data we liberated:
> http://data.liberit.hu/
>
> cheers,s
>
> --
> gpg: https://www.ctrlc.hu/~stef/stef.gpg
> gpg fp: F617 AC77 6E86 5830 08B8  BB96 E7A4 C6CF A84A 7140
> otr fp: https://www.ctrlc.hu/~stef/otr.txt
>
> _______________________________________________
> okfn-discuss mailing list
> okfn-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-discuss
>




More information about the okfn-discuss mailing list