[open-science] New PDF Table transcription for CrowdCrafting/PyBossa

Daniel Lombraña González teleyinex at gmail.com
Fri Sep 13 12:38:25 UTC 2013


Hi again,

I forgot to mention that this application could be a really nice addon to
OCR tables, where the data is not perfectly transcribed. The handsotable
library can be populated with
data<http://handsontable.com/demo/prepopulate.html>,
so you would be able to OCR the document first, then, create a task in
CrowdCrafting with the OCRed data, so the table is pre-loaded for the
volunteer, then, the user can fix and improve the results of the OCR :-)

Cheers!


On Fri, Sep 13, 2013 at 2:35 PM, Daniel Lombraña González <
teleyinex at gmail.com> wrote:

> Hi there!
>
> Today I'm really happy to announce a new application/template for PyBossa
> that can be used in CrowdCrafting.org for transcribing tables locked in PDF
> files :-D
>
> The application is very similar to the PDF transcription one<http://crowdcrafting.org/app/pdftranscribe/>,
> as it is a new version of it, but showing how you can integrate a tabular
> data library to format the transcriptions easily.
>
> The application basically loads a PDF file (that can be hosted in your
> public Dropbox folder!) and asks you how many columns the table has in the
> page, if any. Then, if the answer is 5, a new table will be automatically
> created, adding new rows everything you complete one! Simple and clean!
>
> Each row is stored as a list in a JSON object, making really easy to parse
> it and export it to other formats.
>
> Here you have a short Youtube video showing the app:
> http://www.youtube.com/watch?v=yfnJHALzlZc
>
> The application: http://crowdcrafting.org/app/pdftabletranscribe/
>
> And the official Tweet:
> https://twitter.com/teleyinex/status/378474287532744704
>
> NOTE: this app works really well, when in each page there is only 1 table,
> and there are no cells joined. For other cases, the template should be
> adapted, this is just the minimum version to work with. The handsontable
> library <http://crowdcrafting.org/app/pdftranscribe/> is really awesome,
> so you can adapt it to your needs without problems.
>
> All the best,
>
> Daniel
>
> --
> http://daniellombrana.es
> http://citizencyberscience.net
> http://www.shuttleworthfoundation.org/fellows/daniel-lombrana/
>
> ··························································································································································
> Please do NOT use proprietary file formats to share files
> like DOC or XLS, instead use PDF, HTML, RTF, TXT, CSV or
> any other format that does not impose on the user the employment
> of any specific software to work with the information inside the files.
>
> ··························································································································································
> Por favor, NO utilice formatos de archivo propietarios para el
> intercambio de documentos, como DOC y XLS, sino PDF, HTML, RTF, TXT, CSV
> o cualquier otro que no obligue a utilizar un programa de un
> fabricante concreto para tratar la información contenida en él.
>
> ··························································································································································
>



-- 
http://daniellombrana.es
http://citizencyberscience.net
http://www.shuttleworthfoundation.org/fellows/daniel-lombrana/
··························································································································································
Please do NOT use proprietary file formats to share files
like DOC or XLS, instead use PDF, HTML, RTF, TXT, CSV or
any other format that does not impose on the user the employment
of any specific software to work with the information inside the files.
··························································································································································
Por favor, NO utilice formatos de archivo propietarios para el
intercambio de documentos, como DOC y XLS, sino PDF, HTML, RTF, TXT, CSV
o cualquier otro que no obligue a utilizar un programa de un
fabricante concreto para tratar la información contenida en él.
··························································································································································
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20130913/8b6206e0/attachment-0001.html>


More information about the open-science mailing list