[open-linguistics] An open linguistic resource for German & links to open data sources and tools

Adrien Barbaresi adrien.barbaresi at ens-lyon.fr
Mon Mar 12 15:01:14 UTC 2012

Dear all,

First I would (also) like to thank you for the interesting workshop last
week in Frankfurt.

I was at the conference to introduce an open linguistic resource, though
not in the linked data framework. Christian advised me to drop a line
about it on the mailing-list, so I have this short description (please
excuse multiple posts, as I will also mention it on
corpora-list) :

The resource consists of speeches by the last German Presidents and
Chancellors as well as a few ministers, all gathered from official
sources. It provides raw data, metadata and tokenized text with
part-of-speech tagging and lemmas in XML TEI format for researchers that
are able to use it and a simple visualization interface for those who
want to get a glimpse of what is in the corpus before downloading it or
thinking about using more complete tools.
The visualization output is in valid CSS/XHTML format, it takes
advantage of recent standards. The purpose is to give a sort of
Zeitgeist, an insight on the topics developed by a government official
and on the evolution in the use of general concepts.
This resource is freely available under a CC BY-SA licence :

I see that open data and tools are a topic of this mailing-list, the
discussion about Google Refine reminded me that I collected a lot of
links for a lecture (mainly) about open data last autumn. They are
gathered in a pearltree, it is in French but also partly in English,
so it should not be too difficult to navigate in it.
(NB : this website uses RDF.)


Adrien Barbaresi <adrien.barbaresi at ens-lyon.fr>

More information about the open-linguistics mailing list