[OpenSpending] Extracting data from PDFs

Lucy Chambers lucy.chambers at okfn.org
Thu Dec 20 11:17:35 UTC 2012


Hi all,

I figured you might be able to help. My colleague, Michael, is writing
a course on Optical Character Recognition for the School of Data
project.

He's done the easy, nicely formatted PDFs. Now he's looking for some
real-life, nasty examples of PDFs that people have to deal with.
Probably scanned / photographed PDFs, or just really tricky PDFs so
that we get a good difficulty scale across the course.

Any pointers - very helpful, it's really nice to base these courses on
real data that people have actually been grappling with!

Lucy

-- 
Lucy Chambers
Project Coordinator,
School of Data & OpenSpending
Open Knowledge Foundation
Skype: lucyfediachambers
Twitter: @lucyfedia




More information about the openspending mailing list