[annotator-dev] working on fm
Rufus Pollock
rufus.pollock at okfn.org
Fri Mar 16 09:09:27 UTC 2012
On 16 March 2012 02:02, Adam Hyde <adam.hyde at sourcefabric.org> wrote:
>
> Hi
>
> Thanks Rufus that is very interesting. How would it effect things if the
> read and write environment was the same? Is is possible to change the xpath
> identifier dynamically as the text gets altered?
That's exactly what you'd need to do and I was suggesting would be needed :-)
The other point I was making was is that if you control the text (not
the case in many of annotators use cases e.g. bookmarklet or
OpenShakespeare) you can start inserting lots of special tags and
possibly utilizing those as your addressing points (though you would
probably need to change annotator a bit ...)
I have a feeling there is a reasonble 80/20 solution out there.
rufus
> Adam
>
> Adam Hyde
> Booktype Project Lead
>
> On 15 Mar 2012 20:56, "Rufus Pollock" <rufus.pollock at okfn.org> wrote:
>
> On 15 March 2012 18:06, Adam Hyde <adam.hyde at sourcefabric.org> wrote:
>> hi
>>
>> I installed it on FLO...
>
> We've thought a lot about this (and there's ongoing thought about this
> as part of related Textus project [1]). Summary from my point of view:
>
> [1]: http://textusproject.org/
>
> * Ultimately to handle text changes you have to do some kind of
> migration of annotation references.
> * Annotation addresses are based on a pointer to some fixed identifier
> in the text + character offset from there. E.g. identifier could be
> element id, paragraph id, xpath etc (note an xpath identifier is
> really just a special kind of offset ...).
> * The more atomic (i.e. smaller the area they cover) your addresses
> the less is your migration problems (but the worse your interference
> with the text) [2]
>
> * In essence your migration will run an algorithm such as the following
>
> * Compare two texts.
> * For all annotations with atomic sections whose identifier and
> content is unchanged we need do nothing
> * For all sections whose identifier has changed but whose content
> is unchanged update the relevant annotation identifiers (note it could
> be difficult to work out the changes in identifiers to make this
> possible -- e.g. suppose you have cut and pasted one paragraph in a
> document. This will change all xpaths following the cut section and
> before it's reinsertion)
> * For all sections with changed content update the offsets
>
> [2]: http://blog.okfn.org/2007/01/24/thinking-about-annotation/
>
> In general this shows that identifiers which are tied to paths in
> document are especially bad. However they are one of the *only*
> options if you can't interfere with the original document (e.g. by
> inserting your own ids!) -- the other option i know of here is to do
> hashing of small string sections of the document to generate your
> identifiers. This does not require interfering with your document but
> generates addresses into the document. However, it is computation-ly
> costly and more fragile to character changes.
>
> Thus, one extreme option, that would make updating significantly
> easier, but which requires you have complete control of your html text
> is to insert identifier marks (e.g. in html <span id="{id}"></span>),
> say, every sentence and configure the annotator to utilize these ids
> when generation uri's for annotations ...
>
> Rufus
--
Co-Founder, Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/
More information about the annotator-dev
mailing list