No subject


Sun Dec 12 18:29:16 GMT 2010


author of the iText text extraction sub-system it is not a simple task, so I
would ask if anyone has handled this problem, to say something :)

Cheers,
Nuno Moniz.

--00163628480c76ea1a04acc441e3
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div class=3D"gmail_quote">Hello everyone.<br><br>I&#39;m at the moment dev=
eloping my master thesis which is focused on Open Data, more specifically o=
n fetching, parsing and publishing Portuguese Legislation.<br>
This envolves PDF parsing which i believe many of you have had trouble alon=
g your projects.<br>
I don&#39;t know if this is the best way to request help, but if not certai=
nly someone will warn me.<br>My problem envolves parsing of double columns =
PDF&#39;s (example <a href=3D"http://dre.pt/pdfgratis/2011/01/00200.pdf" ta=
rget=3D"_blank">http://dre.pt/pdfgratis/2011/01/00200.pdf</a>)<br>


Has anyone had experience (the long hours of algorithm juggling) with this?=
 I&#39;m using iText for the PDF parsing.<br>From what I can figure, based =
on some comments from Kevin Day who is the author of the iText text extract=
ion sub-system it is not a simple task, so I would ask if anyone has handle=
d this problem, to say something :)<br>


<br>Cheers,<br><font color=3D"#888888">Nuno Moniz.<br>
</font></div><br>

--00163628480c76ea1a04acc441e3--



More information about the open-legislation mailing list