[okfn-labs] Questions on data scraping tables within "picture" pdfs.

Hans Thompson hans.thompson1 at gmail.com
Thu Feb 7 15:04:48 UTC 2013


Hello open data crusaders. I hope I am properly following the mailing list
rules as a newcomer and programming neophyte (some conversational R and
learning python at the moment).

I want to build a microtasking project to take pdf "pictures" of tables and
break them into rows and columns.  This way each cell can be a
transcription task with a cell identity.

I've thought a lot on how to do this with R (because a superior QC process
could be implemented easier from my personal experiance) but it lacks the
kind of picture manipulation tools that I am supposing aleady exist for
python etc.

My question:  could pybossa be used to return the rows and column of an
image array from user call from a click? So the user could click for each
space between row and column and split the table picture into a table of
pictures?

Does a better tool exist for this type of task?

Thanks.
Hans Thompson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130207/e82a69d2/attachment-0001.html>


More information about the okfn-labs mailing list