[open-science] New PDF Table transcription for CrowdCrafting/PyBossa
Peter Murray-Rust
pm286 at cam.ac.uk
Fri Sep 20 11:57:53 UTC 2013
Many thanks Daniel
Anders Pedersen and I had a constructive discussion about tables and their
taxonomy and I am looking at some of the ones he sent me. (As you know I am
looking at how machines analyze tables - I think this is directly
complementary to your application).
Many "tables" are not rectangular tables, but simply ways of laying out
information using Excel or HTML tables. Common problems are nested tables,
tables which concatenate tables, merged cells etc. Others simply chane
semantics at random places.These are objectively difficult to describe! I'm
not aware of a formal classification of this problem but it would be
valuable. [https://en.wikipedia.org/wiki/Table_%28information%29 hints at
the problem]
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20130920/d08a9251/attachment-0001.html>
More information about the open-science
mailing list