[Open-Legislation] weurstchen - oeil liberated

Sun Mar 13 00:26:43 UTC 2011

hey,

checkout this:
http://gitorious.org/weurstchen/weurstchen/

it's all still a bit rough, but it's a start. README.rst copied below:

Weurstchen
==========

Weurstchen is a monitor for the European Parlament.

Weurstchen consists of two components,
- the scraper, which loads the data from eurparl.europea.eu/oeil/
- and the web service which publishes the data.

Change tracking
===============
http://robat.euwiki.org/changes/ shows you the latest changes
you can restrict the output to a certain subject:
http://robat.euwiki.org/changes/3.30.25 (this is internet related stuff)

the nice thing is, you can also get the above output in
- raw json
- and atom format

just append ?format=<json|atom> to the above urls, like so:
http://robat.euwiki.org/changes/3.30.25?format=atom
http://robat.euwiki.org/changes/3.30.25?format=json
http://robat.euwiki.org/changes?format=atom
http://robat.euwiki.org/changes?format=json

you'll get the hang of it.

Radar
=====
all the different issues that are active and have a forecasted date can be seen on the radar:
http://robat.euwiki.org/radar
this also can be filtered by subject:
http://robat.euwiki.org/radar/3.30.25

The shenanigans with the '?format=<json|atom>' parameter are also available with the radar.
The radar list the event by default in the order in which they have been published on OEIL, if you want to see the dates in strict ascending order append a 'strict' parameter to the URL

Displaying Dossiers
===================

You can also display specific dossiers, simply by using their reference:
- http://robat.euwiki.org/RSP/2011/2588
- http://robat.euwiki.org/INI/2010/2306
- http://robat.euwiki.org/COD/2008/0249
- http://robat.euwiki.org/COD/2006/0167

should work automaticaly.

Filtering by subjects
=====================
For a complete list of subjects on which filtering is available, head over to:
http://robat.euwiki.org/subjects

Simple Search
=============
For a search in the Titles of the dossiers, try:
http://robat.euwiki.org/search/internet

HCalendar microformat
=====================
All dates and events are formated as hcal, so if you have some semantic tools you can scrape them easily from the page, this is especially useful for importing the dates into google or other calendars using the firefox operator plugin e.g.

The scraper
===========
 - for a fast update (prefered):
   python scraper/oeil.py
 - for a full update (be gentle and use the fast update):
   python scraper/oeil.py full

by default both start 8 parallel threads and start downloading and feeding the whole thing into mongodb.

Dependencies
============
- mongodb
- djang0
- pymongo
- lxml
- BeautifulSoup
- utidylib

Thanks
======
Smari McCarthy - category browsing and an awesome logo/skin

-- 
gpg: https://www.ctrlc.hu/~stef/stef.gpg
gpg fp: F617 AC77 6E86 5830 08B8  BB96 E7A4 C6CF A84A 7140