[open-linguistics] Wiktionary RDF-extraction with DBpedia for en and de
Jonas Brekle
jonas.brekle at gmail.com
Fri Dec 23 16:08:03 UTC 2011
Am Donnerstag, den 22.12.2011, 23:18 -0800 schrieb Jonathan Pool:
> The University of Washington Turing Center extracted data from Wiktionaries for the TransGraph database about 2006, but didn't publish its methods for re-use by others.
>
> If your extractor were extended to cover word classes, definitions, and translations, I could use its output as input to PanLex and thereby better integrate Wiktionary data with data from other resources (http://utilika.org/info/plrefs.shtml).
>
Translations will definitely be covered soon. Word classes and
definitions should be already included (maybe buggy but mostly). Whats
the issue there?
> For word-class categories, it seems to me that the OLIF list (in 3.2.1 on page 14 of http://www.olif.net/documents/NewOLIFstruct&content.pdf) resembles more than the GOLD list the categories that generally appear in conventional lexicographic resources. In PanLex, we have somewhat extended the OLIF list to:
>
> adjv adjective
> advb adverb
> affx affix
> auxv auxiliary verb
> conj conjunction
> detr determiner
> ijec interjection
> misc miscellaneous
> name proper noun
> noun noun
> post postposition
> prep preposition
> pron pronoun
> verb verb
> vpar verb particle
>
we will use a finer granularity (as it is found, wiktionary uses many
more) and therefore use a finer ontology. we think about using OLiA [1]
(which is good for linking to other resources).
> For language identifiers, I have found a combination of ISO 639-2 collective codes and ISO 639-3 and ISO 639-5 codes, supplemented by differentiators of varieties distinguished by lexicographic resources, useful identifiers (http://panlex.org/u). (Safari 5.1 opens pages like this very slowly.)
i dont really get that page. we have the glottolog ontology (not
published yet) that covers the iso 639-3 languages but extends them to
dialects and orders them hierarchically (language families). this sounds
promising.
but although we may use our own stuff, i think you will be able to
integrate well.
regards and thanks for your interest,
Jonas
> _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-linguistics
[1] http://nachhalt.sfb632.uni-potsdam.de/owl/
More information about the open-linguistics
mailing list