[okfn-discuss] OCR assistance with open shakespeare

Nate Olson nate at pjsp.org
Thu Aug 30 13:39:41 UTC 2007


Rufus,

Have you had any feedback about this? Don't recall seeing any replies  
come across the list, though I could have missed something.

Nate

> Date: Tue, 14 Aug 2007 14:36:02 +0100
> From: Rufus Pollock <rufus.pollock at okfn.org>
> Subject: [okfn-discuss] OCR assistance with open shakespeare
> To: okfn-discuss <okfn-discuss at lists.okfn.org>
> Message-ID: <46C1AFC2.8070303 at okfn.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> One of next things we want to do for open shakespeare is provide an  
> open
> introduction for to his works. The obvious idea for this was to use  
> the
> Shakespeare entry in the 11th ed of the Encyclopaedia Britannica as
> detailed in this ticket:
>
> http://p.knowledgeforge.net/shakespeare/trac/ticket/24
>
> I've now written code to grab the relevant tiffs off wikimedia:
>
> http://p.knowledgeforge.net/shakespeare/svn/trunk/src/shakespeare/ 
> src/eb.py
>
> You can also find them online (28 pages) starting at:
>
> http://upload.wikimedia.org/wikipedia/commons/scans/EB1911_tiff/ 
> VOL24%20SAINTE-CLAIRE%20DEVILLE-SHUTTLE/ED4A800.TIF
>
> Next step is to then scan this stuff (after that we can move on to
> proofing whether by ourselves or via http://pgdp.net). When I first  
> had
> a stab at this back in April I tried using gocr. Unfortunately the
> results were so bad that they were unusable. Recently an old ocr  
> engine
> of HP's has been released as open source under the name of tesseract:
>
>    http://code.google.com/p/tesseract-ocr/
>
> It looks like it might be better though I haven't had a chance to play
> with it. I was wondering if there was anyone out there with some  
> access
> to a decent ocr system or had time to play with tesseract and who  
> could
> have a go at OCRing these TIFs?
>
> ~rufus




More information about the okfn-discuss mailing list