[annotator-dev] offset off by one

Marder, Andrew amarder at hbs.edu
Fri Jul 24 16:42:54 UTC 2015


Dear Annotator Developers,

I am having a strange off by one error. I am using Python to convert an
HTML document plus annotations into a tagged text document. To make sure
my code is working properly I want to make sure I can recreate the quote
field stored in an annotation using the original HTML document and the
range field. Here is some pseudo-code describing how I am recreating the
quote:

def get_range(doc, start, end, startOffset, endOffset):
    started = False
    ended = False

    nchars_past_end = 0
    result = ''

    for path, text in doc.iterator():
        if path.endswith(start):
            started = True
        if path.endswith(end):
            ended = True

        if started:
            result += text
        if ended:
            nchars_past_end += len(text)

        if nchars_past_end >= endOffset:
            break

    cutoff = nchars_past_end - endOffset
    return result[startOffset:(len(result) - cutoff)]

This code works with these annotations and this HTML document:

https://github.com/amarder/hal/blob/master/tagger/data_highlights.json

https://github.com/amarder/hal/blob/master/tagger/data_filing.html


But when I test the code with the following annotations and HTML document
my quotes are shifted to the left (there is one extra character at the
beginning of the string and one missing character at the end of the
string):

https://github.com/amarder/hal/blob/master/tagger/data_text_highlights.json

https://github.com/amarder/hal/blob/master/tagger/data_text_filing.html


I've been looking at the source code here:

https://github.com/openannotation/xpath-range/blob/master/src/range.coffee


But, I haven't been able to figure out why I'm having this issue. Any
thoughts on what might be happening here would be greatly appreciated!

Andrew

PS I used Annotator v1.1.0 to create these annotations.




More information about the annotator-dev mailing list