[okfn-labs] Assessment of open data in Germany

Matthew Fullerton matt.fullerton at gmail.com
Wed Oct 8 12:25:44 UTC 2014


Hi labs,
A while back Rufus asked me to update the list on some work going on at OKF
in Germany. Until the end of the year we are working on a project called
'Open Data Monitor' with university colleagues in Bremen. The goal is to
assess how far along German villages, towns, cities, states and the central
government is along the way to opening up data:

- Is it happening at all?
- Is the data that is officially 'open' the end of the story or are there
other departments publishing data independently (business as usual, so to
speak)
- How easy is it to find data on a regional basis?

A large amount of the project work (at our end), is extracting
(semi-)automatically what (open) data is on the websites of various
authorities for evaluation. Hence, a byproduct of the project is a large
collection of metadata from many catalogs that have metadata themselves
(e.g. CKAN portals) as well as web crawls for typical 'data' formats, the
results of which are reviewed by human beings. A more experimental part was
also to process Google and Bing search results for 'data' formats.

As this is the labs list, I'll try and stick to the technical
insights/challenges and skip the political ones (if desired, I can make a
separate summary of that, as so far most documentation is in German):

- Google searches are very limited in what file formats you can specify,
and there is no suitable API from Google for obtaining search results
- Bing (yes, Bing!) has an API. The documentation is terrible, but it has
one. Its not free, but you get a large number of searches for free. And you
can search by file extension and domain, so it was exactly what we needed.
- What is lacking from both search engines is contextual information that
is crucial for constructing metadata: particularly whether the result is
still linked to and if so, where from.
- Web crawling is slow and tedious. Well, no insights there. But it does
work.
- Every catalog and (almost) every CKAN is different and can't be read from
in the same way. Probably no insights there either.
- German open data portals are very heterogeneuos: with around 14 portals
we have CKAN, DKAN, terraCatalog and custom solutions integrated into
existing web sites. Providing a complete read out or dump of the data seems
to be a high priority for some and not for others.

For the last few months, I will be working on consolidating and automating
what is still mostly catalog-specific, manually-triggered code into a
common library, and creating a DB-based infrastructure for hosting crawled
data for review. After some discussion with Friedrich, I'm also going to
try and index the data of the actual files available so that can be
searched as well as the meta data.

The whole thing will be made available on a CKAN instance. The current
client-based, prototype overview of the data is reaching its limits (
www.open-data-map.de)

If you'd like to take a look or get involved, the code is at
github.com/okfde/odm-datenerfassung. The client at
guthub.com/okfde/odmonitormap. The code is changing frequently. There may
be the possibility to pay a further Python developer until the end of 2014
for anyone excited about this kind of thing with time on their hands.

Looking forward to questions and criticisms. I just want to stress that the
chief aim of the project is NOT to create one massive catalog to 'rule them
all', we are just trying to see what is the most sensible way to archive
the working data of the project so that its useful and sustainable
afterwards.

Best,
Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20141008/9f7aa601/attachment-0003.html>


More information about the okfn-labs mailing list