[okfn-labs] Best practice for OCR workflows (re OED 1st edition project)

Tom Morris tfmorris at gmail.com
Thu Jul 11 17:48:48 UTC 2013


The Early Modern OCR Project (EMOP) has some interest writeups of how they
use Tesseract for their work.  OED isn't as old (EMOP focuses on
1475-1800), but some of their notes may be useful in other contexts.

http://emop.tamu.edu/
http://idhmc.tamu.edu/blog/2013/07/10/ocr-tips-and-tricks-from-emop/

Sadly, one of the tools in their tool chain, Aletheia from
Salford/Manchester, is closed source, non-commercial only (despite being EU
funded research!), but EMOP plans to distribute their tools as open source
(not yet, but soon).

Tom



On Mon, Jun 24, 2013 at 11:40 AM, Tom Morris <tfmorris at gmail.com> wrote:

>
> On Mon, Jun 24, 2013 at 7:36 AM, Rufus Pollock <rufus.pollock at okfn.org>wrote:
>
>> On 21 June 2013 21:45, Tom Morris <tfmorris at gmail.com> wrote:
>>
>> Is there a way to get the Abby version direct from the Archive online
>> or would one need to ask them specially?
>>
>
> The Abby version is one of the formats in the directory.  Look for the
> file that ends _abby.gz  There's also a torrent containing all the files if
> that's easier.
>
> Tom
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130711/48ab3aba/attachment-0001.html>


More information about the okfn-labs mailing list