[open-science] fyi - Using OpenCV & Tesseract open source OCR for equation recognition

Carl Boettiger cboettig at gmail.com
Mon Feb 4 20:23:17 UTC 2013


Bryan,

Could you elaborate a bit?  What kind of information do you have to base
the query on, and what data do you want to get out?

You might find Greycite (http://greycite.knowledgeblog.org/) useful
(through its API, I think source-code is also available).  The CrossRef API
is pretty useful: http://api.labs.crossref.org/, particularly if you
already have the DOIs.  A good number of publishers have their own API for
all their data (PLoS: http://api.plos.org/) or at least the metadata
(Nature: http://developers.nature.com/docs/read/APIs), and most publishers
follow the Google Scholar/highwire convention for html metadata
http://scholar.google.com/intl/en-US/scholar/inclusion.html#indexing, which
any open source html/xml parser could help extract.  Sorry if this is old
news and you have something more elaborate in mind.

Cheers


On Mon, Feb 4, 2013 at 11:42 AM, Bryan Bishop <kanzure at gmail.com> wrote:

> On Mon, Feb 4, 2013 at 12:42 PM, Tom Morris <tfmorris at gmail.com> wrote:
>
>>
>> http://ayoungprogrammer.blogspot.ca/2013/01/part-3-making-ocr-for-equations.html
>>
>
> Is there an open source library (possibly using tesseract+opencv or gnu
> gift) that can help with extracting metadata from journal articles, or
> bibliographic items? I would rather look at something that already exists
> instead of writing it on my own (something I see myself doing eventually).
>
> - Bryan
> http://heybryan.org/
> 1 512 203 0507
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
> Unsubscribe: http://lists.okfn.org/mailman/options/open-science
>
>


-- 
Carl Boettiger
UC Santa Cruz
http://www.carlboettiger.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20130204/98aaecef/attachment-0001.html>


More information about the open-science mailing list