[okfn-be] DierenTheater, parsing (lachambre|dekamer).be (was Re: presentation)

Laurent Peuch psycojoker at gmail.com
Sat Feb 25 03:35:53 UTC 2012


Ohai,

Some news: DierenTheater now provide nl data as well as fr data.

Sadly at the time I'm writing this (lachambre|dekamer).be server is
down again (it's the website with the worst uptime I know) so I can't
show you the result right now.

> >You can see the result here http://dieren.vnurpa.ethylix.be/lachambre/document/
> >One example with quite a lot of data http://dieren.vnurpa.ethylix.be/lachambre/document/1825/

This often happens the weekend, I'm wondering if it's because the dude
responsible to reboot the server is at home or because the admins just
shut down the server for the weekend.

Anyway, my next step will be to add a Rest API. I don't know how much
time this will take. Django seems to come with a lot of apps to do
this, maybe I'll find one that match what I want.

Also, I first wanted to parse the whole website before doing those 2
previous step but my motivation didn't follow (nothing funny seems to
be left to parse) so I'm building the API instead to confront my work
to reality and get feedback.

The current parsed data are:
* all deputies and some related informations
* all commissions and their members
* all law projects and propositions (the "documents")
* all written questions (there is more that 60 000 of it since the
  48th legislation)
* annual reports

(Note: the web interface showing the data isn't fully up to date.)

I'm not parsing any pdfs yet.

After this, this will be between parsing new data and building an
intelligent automatic update strategy (doing a sequential parse of
the whole website every night seems a bit overkill and some part like
the commissions agenda are likely to be changed way more often).

Have a nice weekend,

-- 

Laurent Peuch -- Bram




More information about the okfn-be mailing list