[okfn-labs] PyBossa for cultural heritage transcription/description?

Daniel Lombraña González teleyinex at gmail.com
Tue Nov 27 08:10:02 UTC 2012


Dear all,

For PDF transcription I've created recently a new PyBossa *demo app* that
you can find and test in
crowdcrafting.org<http://crowdcrafting.org/app/pdftranscribe>.
The application is really simple as its purpose is to be used as a template
for creating really interesting applications for PDF transcriptions :-)

The application basically loads an external PDF file in the web browser
(without using any third party plugin or manipulating the PDF) and each
page becomes a task (this could be adapted, i.e. you can specify a set of
pages as one task), where the users will have to actually transcribe some
data from the page. The user can zoom in/out of the PDF page really easily
and as you will see it supports PDFs with text and also images, so you can
actually use it without too many problems to transcribe PDFs that have
scanned pages from other documents or books.

The *demo app* is a template so right now the goal is to show how you can
re-use it to add some context and layout to the transcription. In this
example there is only one input box that can be used to transcribe the
whole page, however you can use any form input text to actually extract the
relevant information that you want to extract, i.e. input fields for
authors, or institutions, or captions, ...

This application could be used directly with the Internet Archive (I tested
it with the link that Sam sent me from the Internet Archive and it worked
really well). All you have to do is to add a specific configuration for PDF
files in the web server and the PyBossa application will be able to use any
PDF available from the server. If the server has an API then, really
beautiful and complex versions of this *demo app* could be created for
transcribing documents. If you need help or if you prefer to have a
"virtual" meeting with me, let me know it, as I'll be more than happy to
talk with you.

Best regards,

Daniel

-- 
··························································································································································
http://github.com/teleyinex
http://www.flickr.com/photos/teleyinex
··························································································································································
Por favor, NO utilice formatos de archivo propietarios para el
intercambio de documentos, como DOC y XLS, sino PDF, HTML, RTF, TXT, CSV
o cualquier otro que no obligue a utilizar un programa de un
fabricante concreto para tratar la información contenida en él.
··························································································································································
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20121127/49f7af40/attachment-0002.html>


More information about the okfn-labs mailing list