[annotator-dev] offset off by one
Marder, Andrew
amarder at hbs.edu
Fri Jul 24 16:42:54 UTC 2015
Dear Annotator Developers,
I am having a strange off by one error. I am using Python to convert an
HTML document plus annotations into a tagged text document. To make sure
my code is working properly I want to make sure I can recreate the quote
field stored in an annotation using the original HTML document and the
range field. Here is some pseudo-code describing how I am recreating the
quote:
def get_range(doc, start, end, startOffset, endOffset):
started = False
ended = False
nchars_past_end = 0
result = ''
for path, text in doc.iterator():
if path.endswith(start):
started = True
if path.endswith(end):
ended = True
if started:
result += text
if ended:
nchars_past_end += len(text)
if nchars_past_end >= endOffset:
break
cutoff = nchars_past_end - endOffset
return result[startOffset:(len(result) - cutoff)]
This code works with these annotations and this HTML document:
https://github.com/amarder/hal/blob/master/tagger/data_highlights.json
https://github.com/amarder/hal/blob/master/tagger/data_filing.html
But when I test the code with the following annotations and HTML document
my quotes are shifted to the left (there is one extra character at the
beginning of the string and one missing character at the end of the
string):
https://github.com/amarder/hal/blob/master/tagger/data_text_highlights.json
https://github.com/amarder/hal/blob/master/tagger/data_text_filing.html
I've been looking at the source code here:
https://github.com/openannotation/xpath-range/blob/master/src/range.coffee
But, I haven't been able to figure out why I'm having this issue. Any
thoughts on what might be happening here would be greatly appreciated!
Andrew
PS I used Annotator v1.1.0 to create these annotations.
More information about the annotator-dev
mailing list