[open-science] fyi - Using OpenCV & Tesseract open source OCR for equation recognition

Mon Feb 4 20:21:33 UTC 2013

On Mon, Feb 4, 2013 at 2:42 PM, Bryan Bishop <kanzure at gmail.com> wrote:

> On Mon, Feb 4, 2013 at 12:42 PM, Tom Morris <tfmorris at gmail.com> wrote:
>
>>
>> http://ayoungprogrammer.blogspot.ca/2013/01/part-3-making-ocr-for-equations.html
>>
>
> Is there an open source library (possibly using tesseract+opencv or gnu
> gift) that can help with extracting metadata from journal articles, or
> bibliographic items? I would rather look at something that already exists
> instead of writing it on my own (something I see myself doing eventually).

It's already a pretty hard problem working from PDFs without adding the
noise from OCR as well.

For PDFs there's TeamBeam which claims to be AGPL 3, but the source
repository is password protected:

http://knowminer.know-center.tugraz.at/team-beam-meta-data-extraction-from-scientific-literature
http://www.dlib.org/dlib/july12/kern/07kern.html

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20130204/7130bf0d/attachment-0001.html>