[humanities-dev] Shakespeare Annotations

Rufus Pollock rufus.pollock at okfn.org
Thu Apr 12 22:11:19 UTC 2012

One note: *many* of open shakespeare plays have no or very few
annotations at all (this should be a simple query in annotateit right
;-) ). Thus one option would be to go with the play text with the most
annotations (updating shakespeare texts as needed).


On 12 April 2012 19:13, Nick Stenning <nick.stenning at okfn.org> wrote:
> Ok, so I've now had a look at these properly. Here's where I'm at:
> As far as I can tell, the texts you've used are almost identical to the Moby
> texts. The key word here is "almost". Unfortunately, the fact they're not
> identical rules out a simple conversion from [FinalsClub HTML rendering] ->
> [character offsets] -> [Open Shakespeare HTML rendering]. It's their
> similarity, however, that would make it all the more tragic if we simply
> created another edition of each play and added your annotations on those,
> rather than attempting to display them on the same texts that our people
> have annotated.
> So, what I have to do, somehow, is establish a mapping from [character
> offset in FC edition] to [character offset in OS/Moby edition]. I don't know
> of any tools that exist to do this, but I have some ideas of how it could be
> done which I will code up if I need to.
> As far as the annotations themselves go, there a few issues that need
> resolving before they can be used. Here's a slightly cleaned up (and
> truncated) version of one of the annotations:
>     {
>       "text": " '\"Hecate\" is also ... scene i). '",
>       "uri": "The Tragedy of Macbeth 7.html",
>       "ranges": [{
>         "start": "/span[19]",
>         "end": "/span[20]",
>         "startOffset": 49,
>         "endOffset": 55
>       }],
>       "quote": " 'HECATE ",
>       "finalsclub_id": 5029
>     }
> As you can see, it apparently starts in '/span[19]' and ends in '/span[20]',
> which can't possibly be true given it contains just the text "HECATE", and
> each span represents an entire scene. This appears to be the case for all
> the annotations: they always end at least in the next scene! Now, I could
> just subtract 1 from the index of each "end" xpath, but it would be good if
> David could have a look in his code and see if this is the right thing to
> do.
> In addition, there's a lot of odd quoting going on in the "quote" and "text"
> fields -- but that's relatively easy for me to fix up. Ditto "ranges" being
> an array, not a single object.
> Anyway, the action points from this email are:
> 1) Anyone reading this who knows of tools to fuzzily align similar texts:
> please let us know.
> 2) David: could you check the code that generates these XPaths?
> Best,
> Nick

Co-Founder, Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/

More information about the humanities-dev mailing list