[Open-Legislation] Request for help - Portuguese legislation

Nuno Moniz nunompmoniz at gmail.com
Mon Sep 12 20:20:18 UTC 2011


Hello everyone.

I'm at the moment developing my master thesis which is focused on Open Data,
more specifically on fetching, parsing and publishing Portuguese
Legislation.
This envolves PDF parsing which i believe many of you have had trouble along
your projects.
I don't know if this is the best way to request help, but if not certainly
someone will warn me.
My problem envolves parsing of double columns PDF's (example
http://dre.pt/pdfgratis/2011/01/00200.pdf)
Has anyone had experience (the long hours of algorithm juggling) with this?
I'm using iText for the PDF parsing.
>From what I can figure, based on some comments from Kevin Day who is the
author of the iText text extraction sub-system it is not a simple task, so I
would ask if anyone has handled this problem, to say something :)

Cheers,
Nuno Moniz.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-legislation/attachments/20110912/1c916fae/attachment-0001.html>


More information about the open-legislation mailing list