[iRail] BeLaws
Koen van Besien
koen.vanbesien at gmail.com
Mon Feb 14 14:44:50 UTC 2011
Hi,
Just as a side note and bit offtopic.
What do you use as scraping method?
An interesting project is this one:
http://doc.scrapy.org/intro/overview.html
Quite complete at the moment and using python and xpath to scrape
your data. Worth to look at!
If we go broader then only rail info.
www.openbiz.be is another way of getting a better interface
to look up company data. (very incomplete was made in one weekend not by me btw)
but is shows the power of what you can do if the government is not doing it.
http://www.ejustice.just.fgov.be/tsv_pub/index_n.htm
Kind regards,
Koen
On Sat, Feb 12, 2011 at 5:01 PM, Pieter Colpaert
<pieter.colpaert at gmail.com> wrote:
> Hi list!
>
> I had trouble looking up some Belgian laws so I decided to scrape it and
> make my own "one-field-google-like" site. As I use it for the NPO and
> some people were interested in it I thought it would be interesting
> sharing my results so far:
>
> The project is at http://github.com/iRail/BeLaws
>
> 1. Scraping: The scraping is done. All the laws are downloaded on our
> server. You can download them yourself (which will take very long) using
> the fetcher script in the scraper directory.
> 2. Parsing: Tim Esselens will write a perl script to parse the html
> files to a more readable format. He will add an API to it so that
> everyone can write their own applications for looking up the law.
> 3. Hosting: The project will be hosted at belaws.iRail.be when it's
> ready.
> 4. Interface: We put everything in apache lucene (which is an indexer
> and full-text search engine). I'm writing a Java Servlet for it. You can
> see the current results in the attached screenshots.
>
> Pieter
>
> --
> +32 (0) 486 74 71 22
> iRail vzw/asbl
>
> http://project.iRail.be
>
> _______________________________________________
> iRail mailing list
> iRail at list.irail.be
> http://lists.rootspirit.com/mailman/listinfo/irail
>
>
--
Koen van Besien
0485 68 29 28
More information about the iRail
mailing list