[open-science] fyi - Using OpenCV & Tesseract open source OCR for equation recognition
Carl Boettiger
cboettig at gmail.com
Mon Feb 4 20:23:17 UTC 2013
Bryan,
Could you elaborate a bit? What kind of information do you have to base
the query on, and what data do you want to get out?
You might find Greycite (http://greycite.knowledgeblog.org/) useful
(through its API, I think source-code is also available). The CrossRef API
is pretty useful: http://api.labs.crossref.org/, particularly if you
already have the DOIs. A good number of publishers have their own API for
all their data (PLoS: http://api.plos.org/) or at least the metadata
(Nature: http://developers.nature.com/docs/read/APIs), and most publishers
follow the Google Scholar/highwire convention for html metadata
http://scholar.google.com/intl/en-US/scholar/inclusion.html#indexing, which
any open source html/xml parser could help extract. Sorry if this is old
news and you have something more elaborate in mind.
Cheers
On Mon, Feb 4, 2013 at 11:42 AM, Bryan Bishop <kanzure at gmail.com> wrote:
> On Mon, Feb 4, 2013 at 12:42 PM, Tom Morris <tfmorris at gmail.com> wrote:
>
>>
>> http://ayoungprogrammer.blogspot.ca/2013/01/part-3-making-ocr-for-equations.html
>>
>
> Is there an open source library (possibly using tesseract+opencv or gnu
> gift) that can help with extracting metadata from journal articles, or
> bibliographic items? I would rather look at something that already exists
> instead of writing it on my own (something I see myself doing eventually).
>
> - Bryan
> http://heybryan.org/
> 1 512 203 0507
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
> Unsubscribe: http://lists.okfn.org/mailman/options/open-science
>
>
--
Carl Boettiger
UC Santa Cruz
http://www.carlboettiger.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20130204/98aaecef/attachment-0001.html>
More information about the open-science
mailing list