[open-science-dev] Fwd: Data-Transcriber Follow up

Rufus Pollock rufus.pollock at okfn.org
Thu Jul 7 14:22:24 UTC 2011

---------- Forwarded message ----------
From: Lucas Ferreira Mation <lucasmation at gmail.com>
Date: 6 July 2011 22:08
Subject: Data-Transcriber Follow up
To: Francois Grey <francois.grey at cern.ch>, Daniel Lombraña González <
teleyinex at gmail.com>, Rufus Pollock <rufus.pollock at okfn.org>, Guo Xu <
digitalepourpre at gmail.com>, Javier Ruiz <javierruizorg at gmail.com>, Nazareno
Andrade <nazareno at gmail.com>, Nigini Abilio <nigini at gmail.com>,
jenny.molloy at okfn.org

Dear all,

It was great to see a demo materialize in the Hackfest. This email is to
follow up on the work: introduce people and see who is interested, discuss
how to push the development of the tool forward.

In Brasil, besides me, we have 3 developers base at UFCG computer science
department, in the NorthEast of Brazil, who intend to work on this:
Nazareno, a professor, Nigini, a PHD student, and an undergrad assistant. We
had a meeting today and we are willing to push this version into a fully
functional Demo over the next month. Or at least try. But in order to do
that I would be good for us to discuss the broad options and were we want to

The broad idea we discussed is to introduce
Bossa<http://boinc.berkeley.edu/trac/wiki/BossaIntro>in the demo.
Bossa will manage the users and job assignments (A "job"
consists of a pair: "table image" + "unique googlespreadsheet"). This will
mean coding in PHP and translating the current code to PHP.  For the moment
we intended to stick with googleDocs of the table interface on the right

Let me know if you guys are ok with this path. We could still change this
things latter, but the idea is to have something working soon so that we can
test it.

If you can Let us know how each person can contribute (bellow a more
detailed list)

Besides the Brasilians , Daniel, François, Jenny,  Guo and Rufus (who were
at the hackfest) I´m also incluing Javier Ruiz in this email. Javier has
pointed me to http://scripto.org/ a smilar open souce tool for transcribing
text that can even be used to generate tables (although the interface is not
good (is the same as to generate tables a wikipedia article)). Also the
crowdsourcing is wiki like, more fluid, with version control but with no
explicit job assignment, volunteer management done. This it can be a good
source for code but I would still use Bossa. Javier's group is involved in
creating a platform form volunteer table transcription of genealogy records
that have a more fixed template.


We were completing the Task list:

1) Image Preparation  (DONE)
1.5) Rename images acording to some index, ex: "Book1page3.tif"

2) Job Management
2.1) Integrate BOSSA acount manager. (Credit can be atributed because each
user is only sent to a page at the time)
    2.1.1) Automaticaly creat a job when user identify table in document
        a) associate that image (and others of the same page) to that job
        b) creat a unique googleDocs Spreadsheet for that page (automate it
using API)

3) User interface
3.1) Define action that creates jobs ("is there a table here"?)
3.3) Vizualize google spreadsheet on the right of the page (DONE)
    3.3.1) take the headers off as much as possible (preferably only table)
3.4: Visulalize the form extracting
3.5) Create HTML form for metadata and embed in page
3.6) add mark rectangle tool  (not a priority now)
    3.6.1) associate this info to OCR (done at the server) and return the
result. User decides weather to use it or not.

Co-Founder, Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science-dev/attachments/20110707/35b3ba29/attachment.html>

More information about the open-science-dev mailing list