[okfn-labs] PyBossa for cultural heritage transcription/description?

Sam Leon sam.leon at okfn.org
Wed Nov 28 09:25:27 UTC 2012


Dear Daniel,

This is fantastic and we will be blogging about this, it's a much needed
glimpse of the potential here.

One question: how easy would it be to customise this app so that you could
select arbitrary regions of the PDF to annotate much like the What's the
Score app Jonathan circulated last week?

Really looking forward to pushing this out, I think people are going to be
very excited.

Cheers,
Sam

On Tue, Nov 27, 2012 at 9:05 AM, Daniel Lombraña González <
teleyinex at gmail.com> wrote:

> Dear all,
>
> For PDF transcription I've created recently a new PyBossa *demo app* that
> you can find and test in crowdcrafting.org<http://crowdcrafting.org/app/pdftranscribe>.
> The application is really simple as its purpose is to be used as a template
> for creating really interesting applications for PDF transcriptions :-)
>
> The application basically loads an external PDF file in the web browser
> (without using any third party plugin or manipulating the PDF) and each
> page becomes a task (this could be adapted, i.e. you can specify a set of
> pages as one task), where the users will have to actually transcribe some
> data from the page. The user can zoom in/out of the PDF page really easily
> and as you will see it supports PDFs with text and also images, so you can
> actually use it without too many problems to transcribe PDFs that have
> scanned pages from other documents or books.
>
> The *demo app* is a template so right now the goal is to show how you can
> re-use it to add some context and layout to the transcription. In this
> example there is only one input box that can be used to transcribe the
> whole page, however you can use any form input text to actually extract the
> relevant information that you want to extract, i.e. input fields for
> authors, or institutions, or captions, ...
>
> This application could be used directly with the Internet Archive (I
> tested it with the link that Sam sent me from the Internet Archive and it
> worked really well). All you have to do is to add a specific configuration
> for PDF files in the web server and the PyBossa application will be able to
> use any PDF available from the server. If the server has an API then,
> really beautiful and complex versions of this *demo app* could be created
> for transcribing documents. If you need help or if you prefer to have a
> "virtual" meeting with me, let me know it, as I'll be more than happy to
> talk with you.
>
> Best regards,
>
> Daniel
>
>
>
> On Fri, Nov 23, 2012 at 2:06 PM, Daniel Lombraña González <
> teleyinex at gmail.com> wrote:
>
>> Hi again,
>>
>> Let me introduce you to Francisco Brasileiro and Lucas Ferreira, our
>> contacts for the project they are doing about transcribing old books in
>> collaboration with the Internet Archive for the Brazil government.
>>
>> Francisco and Lucas, Sam has recently contacted someone that is working
>> with the Internet Archive and they contacted us to know about the
>> possibilities of creating a PyBossa project where you can do some data
>> transcription from documents. If I'm not mistaken, your Brazil project has
>> an agreement with the Internet Archive for doing the scanning (maybe it is
>> already done) for the books, and you have almost built the full application
>> that will allow you to extract the data from those PDFs.
>>
>> As all of you share a similar interest I think that you should meet and
>> talk to each other as maybe a nice collaboration could arise from this :-)
>>
>> Let me know if you need more info about PyBossa, ok?
>>
>> Cheers,
>>
>> Daniel
>>
>>
>>
>>
>>
>> On Fri, Nov 23, 2012 at 12:14 PM, Sam Leon <sam.leon at okfn.org> wrote:
>>
>>> Hi Daniel,
>>>
>>> Amazing!
>>>
>>> Could you please intro me to the colleague you refer to who are working
>>> with the Internet Archive, I'd love to hear more about this use case so
>>> that we can publicise this. The people who we engage via openglam.orgwould be
>>> *very very *keen to hear about this.
>>>
>>> Cheers,
>>> Sam
>>>
>>>
>>> On Fri, Nov 23, 2012 at 7:25 AM, Daniel Lombraña González <
>>> teleyinex at gmail.com> wrote:
>>>
>>>> Hi there,
>>>>
>>>> For image classification and transcription PyBossa only needs the
>>>> following:
>>>>
>>>> 1.-  A list of image files that can be accessed via http, for example
>>>> in Flickr or in a personal http folder
>>>> 2.- Modify a Flickr Person Finder to fit their needs (what do they want
>>>> to transcribe? are they looking for specific elements in the pictures?)
>>>> 3.- Create the tasks in crowdcrafting.org
>>>> 4.- Start collecting the data :-)
>>>>
>>>> Our colleagues of Brazil have more or less a full workflow for
>>>> transcribing big scanned books. Actually they are collaborating with people
>>>> from the Internet Archive. We may contact them again I think :-)
>>>>
>>>> Cheers,
>>>>
>>>> Daniel
>>>>
>>>>
>>>>
>>>> On Thu, Nov 22, 2012 at 11:41 AM, Sam Leon <sam.leon at okfn.org> wrote:
>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> I am sitting with someone here who is a musicology archivist from the Archives
>>>>> de la Ville de Bruxelles who is digitising content for the Internet Archive.
>>>>>
>>>>> How far are we away from doing something with these kind of images?
>>>>>
>>>>> http://archive.org/details/leprinceigoropra00pvms
>>>>>
>>>>> Best,
>>>>> Sam
>>>>>
>>>>> On Thu, Nov 22, 2012 at 8:33 AM, Daniel Lombraña González <
>>>>> teleyinex at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This is great :D You can actually create an app using the Flickr
>>>>>> Person Finder as a template and modify only one link to get the images from
>>>>>> Flickr. The goal could be use the http://www.flickr.com/commonsFlickr
>>>>>> Commons pools and classify the images, etc. Actually lots of Museums
>>>>>> and institutions are pushing photos to Flickr Commons<http://www.flickr.com/commons/institutions/>so we only need to contact one of those participant institutions and see if
>>>>>> they want the app :-)
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 21, 2012 at 9:15 PM, Etienne Posthumus <
>>>>>> eposthumus at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 21 November 2012 17:55, Jonathan Gray <jonathan.gray at okfn.org>wrote:
>>>>>>>
>>>>>>>> It is a lovely project, and I'm wondering how far we are from being
>>>>>>>> able to - e.g. - have a PyBossa image classification/description project
>>>>>>>> with a cultural heritage institution or open content project (like the
>>>>>>>> Internet Archive or Wikimedia Foundation).
>>>>>>>>
>>>>>>>
>>>>>>> All the pieces are there, as Rufus says it could be used right now.
>>>>>>> As a matter of fact it is being done, albeit for a simpler application.
>>>>>>>
>>>>>>> We are busy making a 'tagging game' for the Amsterdam Museum to
>>>>>>> allow middle school pupils to tag items as part of their school visits to
>>>>>>> the museum.
>>>>>>> The source for the images is the Adlib museum management system,
>>>>>>> which has an API. The first prototype version runs on Django, as the museum
>>>>>>> were in a bit of a hurry to get something out the door and had specific
>>>>>>> requirements with regards to logins and writing the data back to the Adlib
>>>>>>> database. In phase 2 of the project we are adding a link to PyBossa so that
>>>>>>> one can generate a PyBossa app from the Django application, without needing
>>>>>>> to do any Python coding.
>>>>>>>
>>>>>>> The images are previewed and selected from the Adlib search API,
>>>>>>> questions are managed by the museum staff in the Django Admin backend, and
>>>>>>> the PyBossa items are generated as a combination of these two and created
>>>>>>> using the PyBossa API plus the user secret key.
>>>>>>>
>>>>>>> As soon as this part is functional it should appear in the
>>>>>>> Crowdcrafting site as an app.
>>>>>>>
>>>>>>> EP
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> okfn-labs mailing list
>>>>>>> okfn-labs at lists.okfn.org
>>>>>>> http://lists.okfn.org/mailman/listinfo/okfn-labs
>>>>>>> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> ··························································································································································
>>>>>> http://github.com/teleyinex
>>>>>> http://www.flickr.com/photos/teleyinex
>>>>>>
>>>>>> ··························································································································································
>>>>>> Por favor, NO utilice formatos de archivo propietarios para el
>>>>>> intercambio de documentos, como DOC y XLS, sino PDF, HTML, RTF, TXT,
>>>>>> CSV
>>>>>> o cualquier otro que no obligue a utilizar un programa de un
>>>>>> fabricante concreto para tratar la información contenida en él.
>>>>>>
>>>>>> ··························································································································································
>>>>>>
>>>>>> _______________________________________________
>>>>>> okfn-labs mailing list
>>>>>> okfn-labs at lists.okfn.org
>>>>>> http://lists.okfn.org/mailman/listinfo/okfn-labs
>>>>>> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sam Leon
>>>>> Community Coordinator
>>>>> Open Knowledge Foundation
>>>>> http://okfn.org/
>>>>> Skype: samedleon
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ··························································································································································
>>>> http://github.com/teleyinex
>>>> http://www.flickr.com/photos/teleyinex
>>>>
>>>> ··························································································································································
>>>> Por favor, NO utilice formatos de archivo propietarios para el
>>>> intercambio de documentos, como DOC y XLS, sino PDF, HTML, RTF, TXT, CSV
>>>> o cualquier otro que no obligue a utilizar un programa de un
>>>> fabricante concreto para tratar la información contenida en él.
>>>>
>>>> ··························································································································································
>>>>
>>>
>>>
>>>
>>> --
>>> Sam Leon
>>> Community Coordinator
>>> Open Knowledge Foundation
>>> http://okfn.org/
>>> Skype: samedleon
>>>
>>>
>>
>>
>> --
>>
>> ··························································································································································
>> http://github.com/teleyinex
>> http://www.flickr.com/photos/teleyinex
>>
>> ··························································································································································
>> Por favor, NO utilice formatos de archivo propietarios para el
>> intercambio de documentos, como DOC y XLS, sino PDF, HTML, RTF, TXT, CSV
>> o cualquier otro que no obligue a utilizar un programa de un
>> fabricante concreto para tratar la información contenida en él.
>>
>> ··························································································································································
>>
>
>
>
> --
>
> ··························································································································································
> http://github.com/teleyinex
> http://www.flickr.com/photos/teleyinex
>
> ··························································································································································
> Por favor, NO utilice formatos de archivo propietarios para el
> intercambio de documentos, como DOC y XLS, sino PDF, HTML, RTF, TXT, CSV
> o cualquier otro que no obligue a utilizar un programa de un
> fabricante concreto para tratar la información contenida en él.
>
> ··························································································································································
>



-- 
Sam Leon
Community Coordinator
Open Knowledge Foundation
http://okfn.org/
Skype: samedleon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20121128/92a777df/attachment-0002.html>


More information about the okfn-labs mailing list