[OpenSpending] Extracting data from PDFs

Vitor Baptista vitor at vitorbaptista.com
Thu Dec 20 21:02:35 UTC 2012


Hi Lucy and Michael,

Compared to the other pdfs sent here, they're a piece of cake. But it might
be useful to have examples in a bunch of languages.

In
http://www.camara.gov.br/proposicoesWeb/prop_emendas?idProposicao=545925&subst=0,
you can get all changes approved to the new brazilian Forest Code (just
click in "Inteiro teor"). For example,
http://www.camara.gov.br/proposicoesWeb/prop_mostrarintegra?codteor=1023095&filename=EMC+2/2012+MPV57112+%3D%3E+MPV+571/2012.
They're
printed and scanned forms, sometimes with stamps and signatures over the
text.

Cheers,
Vítor.

2012/12/20 Lucy Chambers <lucy.chambers at okfn.org>

> Hi all,
>
> I figured you might be able to help. My colleague, Michael, is writing
> a course on Optical Character Recognition for the School of Data
> project.
>
> He's done the easy, nicely formatted PDFs. Now he's looking for some
> real-life, nasty examples of PDFs that people have to deal with.
> Probably scanned / photographed PDFs, or just really tricky PDFs so
> that we get a good difficulty scale across the course.
>
> Any pointers - very helpful, it's really nice to base these courses on
> real data that people have actually been grappling with!
>
> Lucy
>
> --
> Lucy Chambers
> Project Coordinator,
> School of Data & OpenSpending
> Open Knowledge Foundation
> Skype: lucyfediachambers
> Twitter: @lucyfedia
>
> _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending/attachments/20121220/aefb9bf9/attachment.html>


More information about the openspending mailing list