[OpenSpending] Extracting data from PDFs

Lucy Chambers lucy.chambers at okfn.org
Thu Dec 20 17:57:45 UTC 2012


Ouch - thanks Ivan and Sam. I think yes, we should bear in mind our
international audience. We'll test first with Latin characters, but
we're definitely going to have to deal with diacritics and other
languages at some point (and I've got a special place in my heart for
Cyrillic ;) )

Lucy

On Thu, Dec 20, 2012 at 5:23 PM, Ivan Begtin <ibegtin at gmail.com> wrote:
> Hi Lucy,
>   we have lot's of nasty old russian texts inside PDF like -
> http://istmat.info/node/18484 "Russian Empire statistical calendar".
> It's really hard and nasty.
>
> Best Regards,
>   Ivan Begtin
>
>
> 2012/12/20 Lucy Chambers <lucy.chambers at okfn.org>:
>> Hi all,
>>
>> I figured you might be able to help. My colleague, Michael, is writing
>> a course on Optical Character Recognition for the School of Data
>> project.
>>
>> He's done the easy, nicely formatted PDFs. Now he's looking for some
>> real-life, nasty examples of PDFs that people have to deal with.
>> Probably scanned / photographed PDFs, or just really tricky PDFs so
>> that we get a good difficulty scale across the course.
>>
>> Any pointers - very helpful, it's really nice to base these courses on
>> real data that people have actually been grappling with!
>>
>> Lucy
>>
>> --
>> Lucy Chambers
>> Project Coordinator,
>> School of Data & OpenSpending
>> Open Knowledge Foundation
>> Skype: lucyfediachambers
>> Twitter: @lucyfedia
>>
>> _______________________________________________
>> openspending mailing list
>> openspending at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/openspending
>> Unsubscribe: http://lists.okfn.org/mailman/options/openspending
>
>
>
> --
>
> Best Regards,
>   Ivan Begtin
>
> _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending



-- 
Lucy Chambers
Project Coordinator,
School of Data & OpenSpending
Open Knowledge Foundation
Skype: lucyfediachambers
Twitter: @lucyfedia




More information about the openspending mailing list