[okfn-labs] OCR API?

Friedrich Lindenberg friedrich.lindenberg at okfn.org
Fri Oct 31 13:05:55 UTC 2014


To pile on to that: I think it's a fantastic idea. During Mozfest last
weekend, Marcos Vanetta (Open News, CC) and I decided that there should be
a standard API for NLP services so that you can write each stage of a
document ETL pipeline in a different language (if need be) and on a
different host.

Here's our README, pulls for more precise ideas on the API would be
amazing: https://github.com/OpenNewsLabs/centipede

Also, worth looking at Apache Stanbol and Harlo Holmes' "Unveillance" (
https://github.com/harlo/UnveillanceCore). Also IBM's UIMA (
https://uima.apache.org/).

Cheers,

- Friedrich

On Fri, Oct 31, 2014 at 2:59 PM, Rufus Pollock <rufus.pollock at okfn.org>
wrote:

> Great request Matthew - I think an OCR API (even if just tesseract on a
> server with a wrapper) would be quite cool.
>
> There is this existing idea issue on a PDF to Text service that includes
> some research on OCR:
>
> https://github.com/okfn/ideas/issues/52
>
> Rufus
>
> On 31 October 2014 11:51, Matthew Fullerton <matt.fullerton at gmail.com>
> wrote:
>
>> Hi okfn-labs,
>> The need for OCR comes up again and again. I suppose often enough its
>> enough to run the document(s) through the program of choice (I would be
>> interested in what this is right now) and then deal with the consequences.
>> What I want to know is whether there are any do/-it-yourself service/API
>> solutions out there that I could get up and running on a server to use on a
>> permanent basis?
>>
>> If there isn't one yet I will probably be helping to cobble one together.
>> So advice, experience and expressions of interest would also be appreciated.
>>
>> Best,
>> Matt
>>
>> _______________________________________________
>> okfn-labs mailing list
>> okfn-labs at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/okfn-labs
>> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
>>
>>
>
>
> --
>
> *Rufus PollockFounder and President | skype: rufuspollock | @rufuspollock
> <https://twitter.com/rufuspollock>Open Knowledge <http://okfn.org/> - see
> how data can change the world**http://okfn.org/ <http://okfn.org/> |
> @okfn <http://twitter.com/OKFN> | Open Knowledge on Facebook
> <https://www.facebook.com/OKFNetwork> |  Blog <http://blog.okfn.org/>*
>
> The Open Knowledge Foundation is a not-for-profit organisation.  It is
> incorporated in England & Wales as a company limited by guarantee, with
> company number 05133759.  VAT Registration № GB 984404989. Registered
> office address: Open Knowledge Foundation, St John’s Innovation Centre,
> Cowley Road, Cambridge, CB4 0WS, UK.
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20141031/a48843ca/attachment-0004.html>


More information about the okfn-labs mailing list