[open-linguistics] Wiktionary RDF-extraction with DBpedia for en and de

Jonas Brekle jonas.brekle at gmail.com
Fri Dec 23 16:08:03 UTC 2011


Am Donnerstag, den 22.12.2011, 23:18 -0800 schrieb Jonathan Pool:
> The University of Washington Turing Center extracted data from Wiktionaries for the TransGraph database about 2006, but didn't publish its methods for re-use by others.
> 
> If your extractor were extended to cover word classes, definitions, and translations, I could use its output as input to PanLex and thereby better integrate Wiktionary data with data from other resources (http://utilika.org/info/plrefs.shtml).
> 
Translations will definitely be covered soon. Word classes and
definitions should be already included (maybe buggy but mostly). Whats
the issue there?

> For word-class categories, it seems to me that the OLIF list (in 3.2.1 on page 14 of http://www.olif.net/documents/NewOLIFstruct&content.pdf) resembles more than the GOLD list the categories that generally appear in conventional lexicographic resources. In PanLex, we have somewhat extended the OLIF list to:
> 
> adjv	adjective
> advb	adverb
> affx	affix
> auxv	auxiliary verb
> conj	conjunction
> detr	determiner
> ijec	interjection
> misc	miscellaneous
> name	proper noun
> noun	noun
> post	postposition
> prep	preposition
> pron	pronoun
> verb	verb
> vpar	verb particle
> 
we will use a finer granularity (as it is found, wiktionary uses many
more) and therefore use a finer ontology. we think about using OLiA [1]
(which is good for linking to other resources).
> For language identifiers, I have found a combination of ISO 639-2 collective codes and ISO 639-3 and ISO 639-5 codes, supplemented by differentiators of varieties distinguished by lexicographic resources, useful identifiers (http://panlex.org/u). (Safari 5.1 opens pages like this very slowly.)
i dont really get that page. we have the glottolog ontology (not
published yet) that covers the iso 639-3 languages but extends them to
dialects and orders them hierarchically (language families). this sounds
promising. 

but although we may use our own stuff, i think you will be able to
integrate well. 

regards and thanks for your interest,
Jonas
> _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-linguistics
[1] http://nachhalt.sfb632.uni-potsdam.de/owl/





More information about the open-linguistics mailing list