[OpenSpending] Extracting data from PDFs
Ivan Begtin
ibegtin at gmail.com
Thu Dec 20 17:23:58 UTC 2012
Hi Lucy,
we have lot's of nasty old russian texts inside PDF like -
http://istmat.info/node/18484 "Russian Empire statistical calendar".
It's really hard and nasty.
Best Regards,
Ivan Begtin
2012/12/20 Lucy Chambers <lucy.chambers at okfn.org>:
> Hi all,
>
> I figured you might be able to help. My colleague, Michael, is writing
> a course on Optical Character Recognition for the School of Data
> project.
>
> He's done the easy, nicely formatted PDFs. Now he's looking for some
> real-life, nasty examples of PDFs that people have to deal with.
> Probably scanned / photographed PDFs, or just really tricky PDFs so
> that we get a good difficulty scale across the course.
>
> Any pointers - very helpful, it's really nice to base these courses on
> real data that people have actually been grappling with!
>
> Lucy
>
> --
> Lucy Chambers
> Project Coordinator,
> School of Data & OpenSpending
> Open Knowledge Foundation
> Skype: lucyfediachambers
> Twitter: @lucyfedia
>
> _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
--
Best Regards,
Ivan Begtin
More information about the openspending
mailing list