[humanities-dev] OKFest - projects for the hackday and going forward

Iain Emsley iain_emsley at austgate.co.uk
Sat Sep 15 15:14:23 UTC 2012


Afternoon,

Getting ready for OKFest next week.

I've been talking separately to Jonathan, Joris and Sam about a couple 
of projects which  I'm interested in working on.

Firstly, I want to reboot the idea of creating the bookscanner based on 
the Textus/Bibserver software. Although the core software may be small 
(essentially storing the image and OCRed text), extending it to listen 
to a network drive or use PyBossa as discussed before makes it slightly 
less attractive to try and complete within the Open Culture hackday. 
I'll create a repo once I've figured out a set of milestones and a way 
forward.

Secondly, I've created a small repo on my own Github account called 
scriptible (https://github.com/austgate/scriptible). Following various 
conversations over the years and a personal need for them, I am trying 
to create a set of tools which can be used or extended to download the 
Gutenberg index and turn it into something searchable off line and then 
download the text. What I would like to do is to serialise the text so 
that it can be imported into Textus. Tom, would you have a few minutes 
at the hackday to discuss this? There is also a very untested diff class 
which is a couple of functions using the diff library so that users can 
discover differences between two versions of a text (something Rufus and 
I talked about years ago) which needs updating and extending.

I am also trying to use the tool to extract metadata from the texts and 
to create a way of editing the metadata. As part of this I need to 
identify any useful ontologies and perhaps reboot the letters one so 
that Open Correspondence project can be rebooted in the direction that I 
originally wanted to take it: as well as identifying the correspondent, 
try to find texts and authors which were written about to identify 
possible influences. I'm sure this can be extended (Jonathan - you 
hinted about this) to possibly pull in other bits of metadata.

So not entirely a one day thing either but hopefully a useful toolkit to 
allow users to open up data sets and do things with them rather than 
just store them. I am putting up issues at the moment and trying to tidy 
it all up into some sort of roadmap with milestones and will write a 
blog post about it when I have more in existence. Hoping to get some of 
this started and moved on during Tuesday and afterwards.

This is only a small part of what I think can be done but wanted to make 
some sort of a start.

Looking forward to meeting people at the Festival and hackdays.

Regards,

Iain




More information about the humanities-dev mailing list