[okfn-labs] Questions on data scraping tables within "picture" pdfs.
Hans Thompson
hans.thompson1 at gmail.com
Thu Feb 7 17:12:32 UTC 2013
Hello open data crusaders. I hope I am properly following the mailing list
rules as a newcomer and programming neophyte (some conversational R and
learning python at the moment).
I want to build a microtasking project to take pdf "pictures" of tables and
break them into rows and columns. This way each cell can be a
transcription task with a cell identity.
I've thought a lot on how to do this with R (because a superior QC process
could be implemented easier from my personal experiance) but it lacks the
kind of picture manipulation tools that I am supposing aleady exist for
python etc.
My question: could pybossa be used to return the rows and column of an
image array from user call from a click? So the user could click for each
space between row and column and split the table picture into a table of
pictures?
Does a better tool exist for this type of task?
Thanks.
Hans Thompson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130207/6a6870a9/attachment-0002.html>
More information about the okfn-labs
mailing list