No subject
Sun Dec 12 18:29:16 GMT 2010
author of the iText text extraction sub-system it is not a simple task, so I
would ask if anyone has handled this problem, to say something :)
Cheers,
Nuno Moniz.
--00163628480c76ea1a04acc441e3
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<div class=3D"gmail_quote">Hello everyone.<br><br>I'm at the moment dev=
eloping my master thesis which is focused on Open Data, more specifically o=
n fetching, parsing and publishing Portuguese Legislation.<br>
This envolves PDF parsing which i believe many of you have had trouble alon=
g your projects.<br>
I don't know if this is the best way to request help, but if not certai=
nly someone will warn me.<br>My problem envolves parsing of double columns =
PDF's (example <a href=3D"http://dre.pt/pdfgratis/2011/01/00200.pdf" ta=
rget=3D"_blank">http://dre.pt/pdfgratis/2011/01/00200.pdf</a>)<br>
Has anyone had experience (the long hours of algorithm juggling) with this?=
I'm using iText for the PDF parsing.<br>From what I can figure, based =
on some comments from Kevin Day who is the author of the iText text extract=
ion sub-system it is not a simple task, so I would ask if anyone has handle=
d this problem, to say something :)<br>
<br>Cheers,<br><font color=3D"#888888">Nuno Moniz.<br>
</font></div><br>
--00163628480c76ea1a04acc441e3--
More information about the open-legislation
mailing list