[humanities-dev] Shakespeare Annotations

Nick Stenning nick.stenning at okfn.org
Thu Apr 12 18:13:14 UTC 2012


Ok, so I've now had a look at these properly. Here's where I'm at:

As far as I can tell, the texts you've used are almost identical to the
Moby texts. The key word here is "almost". Unfortunately, the fact they're
not identical rules out a simple conversion from [FinalsClub HTML
rendering] -> [character offsets] -> [Open Shakespeare HTML rendering].
It's their similarity, however, that would make it all the more tragic if
we simply created another edition of each play and added your annotations
on those, rather than attempting to display them on the same texts that our
people have annotated.

So, what I have to do, somehow, is establish a mapping from [character
offset in FC edition] to [character offset in OS/Moby edition]. I don't
know of any tools that exist to do this, but I have some ideas of how it
could be done which I will code up if I need to.

As far as the annotations themselves go, there a few issues that need
resolving before they can be used. Here's a slightly cleaned up (and
truncated) version of one of the annotations:

    {
      "text": " '\"Hecate\" is also ... scene i). '",
      "uri": "The Tragedy of Macbeth 7.html",
      "ranges": [{
        "start": "/span[19]",
        "end": "/span[20]",
        "startOffset": 49,
        "endOffset": 55
      }],
      "quote": " 'HECATE ",
      "finalsclub_id": 5029
    }

As you can see, it apparently starts in '/span[19]' and ends in
'/span[20]', which can't possibly be true given it contains just the text
"HECATE", and each span represents an entire scene. This appears to be the
case for all the annotations: they always end at least in the next scene!
Now, I could just subtract 1 from the index of each "end" xpath, but it
would be good if David could have a look in his code and see if this is the
right thing to do.

In addition, there's a lot of odd quoting going on in the "quote" and
"text" fields -- but that's relatively easy for me to fix up. Ditto
"ranges" being an array, not a single object.

Anyway, the action points from this email are:

1) Anyone reading this who knows of tools to fuzzily align similar texts:
please let us know.
2) David: could you check the code that generates these XPaths?

Best,
Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/humanities-dev/attachments/20120412/8317d00c/attachment.html>


More information about the humanities-dev mailing list