[open-science] Data Digitzer

Nick Barnes nb at climatecode.org
Wed Nov 23 08:37:46 UTC 2011


On Wed, Nov 23, 2011 at 05:19, Jenny Molloy <jcmcoppice12 at gmail.com> wrote:
> Thanks Jonathan!
> Sadly, the data digitiser is currently aimed at aiding manual transcription
> of tabular data from PDFs/images rather than automating the process as the
> second blog describes, which would obviously be very awesome but we quickly
> decided impossible in a day (Dd was hacked together at the Open Science
> Workshop) if not impossible full stop. I get the impression with automated
> digitisation that maintains tabular structure that many have tried extremely
> hard and all have failed thus far, although if anyone knows of any open
> projects that are getting close then let us know!

I don't know anything in this space in the open source world, but my
experience with the proprietary "Abbyy FineReader" tells me that
automatically identifying and reproducing tabular structure is
definitely possible.
(it's an OCR system, and give it a document with tables and it
produces a document with more-or-less matching tables).
-- 
Nick Barnes, Climate Code Foundation, http://climatecode.org/




More information about the open-science mailing list