[School-of-data] 'Tabula helps you liberate data tables trapped inside evil PDFs'
pm286 at cam.ac.uk
Sat Apr 6 21:45:47 UTC 2013
On Sat, Apr 6, 2013 at 10:01 PM, Tom Morris <tfmorris at gmail.com> wrote:
> Tabula's principal contribution seems to be the web uploading
> interface and queuing mechanism. It doesn't do page segmentation or
> table identification and it's table processing seems somewhat
Everyone starts somewhere. For me the main virtue is that it's Open,
> People who are interested in this will likely be interested in the
> ICDAR 2013 Table competition which is underway now and its associated
> Thanks for this. I would certainly be interested in any code which was
openly re-usable. It's valuable to have more than one tool anyway.
> Correct, they don't do any page segmentation or table identification.
> The table boundaries need to be hand-drawn for each table and the
> resulting CSV data copied individually. It would be pretty tedious
> for a paper with lots of tables.
But this combines very well with ami2 (bitbucket.org/svg2xml-dev) which
does page segmentation and can identify tables from captions. So between
the two we are a long way down the road.
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the school-of-data