[okfn-labs] New PDF Table transcription for CrowdCrafting/PyBossa

Daniel Lombraña González teleyinex at gmail.com
Fri Sep 13 12:35:49 UTC 2013

Hi there!

Today I'm really happy to announce a new application/template for PyBossa
that can be used in CrowdCrafting.org for transcribing tables locked in PDF
files :-D

The application is very similar to the PDF transcription
as it is a new version of it, but showing how you can integrate a tabular
data library to format the transcriptions easily.

The application basically loads a PDF file (that can be hosted in your
public Dropbox folder!) and asks you how many columns the table has in the
page, if any. Then, if the answer is 5, a new table will be automatically
created, adding new rows everything you complete one! Simple and clean!

Each row is stored as a list in a JSON object, making really easy to parse
it and export it to other formats.

Here you have a short Youtube video showing the app:

The application: http://crowdcrafting.org/app/pdftabletranscribe/

And the official Tweet:

NOTE: this app works really well, when in each page there is only 1 table,
and there are no cells joined. For other cases, the template should be
adapted, this is just the minimum version to work with. The handsontable
library <http://crowdcrafting.org/app/pdftranscribe/> is really awesome, so
you can adapt it to your needs without problems.

All the best,


Please do NOT use proprietary file formats to share files
like DOC or XLS, instead use PDF, HTML, RTF, TXT, CSV or
any other format that does not impose on the user the employment
of any specific software to work with the information inside the files.
Por favor, NO utilice formatos de archivo propietarios para el
intercambio de documentos, como DOC y XLS, sino PDF, HTML, RTF, TXT, CSV
o cualquier otro que no obligue a utilizar un programa de un
fabricante concreto para tratar la información contenida en él.
