[School-of-data] 'Tabula helps you liberate data tables trapped inside evil PDFs'
tfmorris at gmail.com
Sat Apr 6 22:30:46 UTC 2013
On Sat, Apr 6, 2013 at 5:45 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
> On Sat, Apr 6, 2013 at 10:01 PM, Tom Morris <tfmorris at gmail.com> wrote:
>> People who are interested in this will likely be interested in the
>> ICDAR 2013 Table competition which is underway now and its associated
>> Thanks for this. I would certainly be interested in any code which was
> openly re-usable. It's valuable to have more than one tool anyway.
I think the corpus is valuable independent of any of the contest
submissions. It has extracts from 67 documents containing a variety of
different table types along with ground truth information about the tables
and a scoring methodology and tool set. It allows one to take a data
driven approach to evaluating tools and algorithms.
The one drawback of the corpus for this forum is that it's entirely
government documents from the EU & US, so it's not very representative of
scientific publications. Is there a similar corpus for journal articles
(or any effort underway to produce one)?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the school-of-data