[humanities-dev] OKFest - projects for the hackday and going forward
Iain Emsley
iain_emsley at austgate.co.uk
Sat Sep 15 15:14:23 UTC 2012
Afternoon,
Getting ready for OKFest next week.
I've been talking separately to Jonathan, Joris and Sam about a couple
of projects which I'm interested in working on.
Firstly, I want to reboot the idea of creating the bookscanner based on
the Textus/Bibserver software. Although the core software may be small
(essentially storing the image and OCRed text), extending it to listen
to a network drive or use PyBossa as discussed before makes it slightly
less attractive to try and complete within the Open Culture hackday.
I'll create a repo once I've figured out a set of milestones and a way
forward.
Secondly, I've created a small repo on my own Github account called
scriptible (https://github.com/austgate/scriptible). Following various
conversations over the years and a personal need for them, I am trying
to create a set of tools which can be used or extended to download the
Gutenberg index and turn it into something searchable off line and then
download the text. What I would like to do is to serialise the text so
that it can be imported into Textus. Tom, would you have a few minutes
at the hackday to discuss this? There is also a very untested diff class
which is a couple of functions using the diff library so that users can
discover differences between two versions of a text (something Rufus and
I talked about years ago) which needs updating and extending.
I am also trying to use the tool to extract metadata from the texts and
to create a way of editing the metadata. As part of this I need to
identify any useful ontologies and perhaps reboot the letters one so
that Open Correspondence project can be rebooted in the direction that I
originally wanted to take it: as well as identifying the correspondent,
try to find texts and authors which were written about to identify
possible influences. I'm sure this can be extended (Jonathan - you
hinted about this) to possibly pull in other bits of metadata.
So not entirely a one day thing either but hopefully a useful toolkit to
allow users to open up data sets and do things with them rather than
just store them. I am putting up issues at the moment and trying to tidy
it all up into some sort of roadmap with milestones and will write a
blog post about it when I have more in existence. Hoping to get some of
this started and moved on during Tuesday and afterwards.
This is only a small part of what I think can be done but wanted to make
some sort of a start.
Looking forward to meeting people at the Festival and hackdays.
Regards,
Iain
More information about the humanities-dev
mailing list