[OpenSpending] Extracting data from PDFs

Thu Dec 20 14:07:06 UTC 2012

On 20 December 2012 12:17, Lucy Chambers <lucy.chambers at okfn.org> wrote:

> Hi all,
>
> I figured you might be able to help. My colleague, Michael, is writing
> a course on Optical Character Recognition for the School of Data
> project.
>
> He's done the easy, nicely formatted PDFs. Now he's looking for some
> real-life, nasty examples of PDFs that people have to deal with.
> Probably scanned / photographed PDFs, or just really tricky PDFs so
> that we get a good difficulty scale across the course.
>
> Any pointers - very helpful, it's really nice to base these courses on
> real data that people have actually been grappling with!
>
>
Hi,
these are just two very little example.

In Italy our public institutions usually  publish results of tenders like
in this way:
http://www.ponrec.it/media/137519/585-ric_28set12_graduatoria-smart-cities.pdf
(the
worst one)
or this way
http://www.ponrec.it/media/91323/elenco_idee_progettuali_approvate__d.d.84_ric._del_2marzo2012.pdf
(the
better one)

both terrific if I need to manage the data.
Hope this helps
Lucia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending/attachments/20121220/678a9bcf/attachment.html>