[annotator-dev] About the future of the Range implementation

Tue Jun 17 18:39:08 UTC 2014

On Jun 17, 2014 11:28 AM, "Kristof Csillag" <csillag at hypothes.is> wrote:
>
>
>
> On 2014-06-17 19:28, Randall Leeds wrote:
>>
>> The normalization does two things.
>>
>> One is to move the start or end across adjacent tag edges when there is
no text between them. For instance, finding the first text node inside an
element.
>>
>> The other is the splitting you describe.
>>
>> The splitting isn't needed by the Range module. It's needed for the
highlighter only,
>
>
> I don't know about other code, but (for example) our code in our
Annotator fork _assumes_ all NormalizedRanges to start/end at text node
borders. (Which is only true after the splitting.)
>

Could you be specific? What assumes this other than the highlighter span
code?

>
> Removing the splitting from this module would mean definition of the
NormalizedRange. (Because it used to start/end at borders, but not any
more.)
> Doing this would break the code that depends on this behavior. (Like our
code.)
>
> In fact I would like to remove this behavior on the long run, but when we
do that, it will have to involve a serious refactoring.
> (And I would like to avoid doing that right away, because I want to
finish the current round of refactoring ASAP.
> I would like to avoid any further scope creep. As you have keenly
observed, we are prone to that problem.)
>

I'm not suggesting any creep. In fact, I very specifically said we could
take the module out with no changes. But I did want to plot a course for
its future and name it accordingly.

>
> > Serialization is definitely something I'd leave out, unless this is the
core part of the module you care about.
>
> Actually, it is a core part.
>
> That's the most important functionality we are using: turning the native
browser ranges into xpath expression, and vica versa.
>
>
>> In that case, I'd call it xpath-range because that's what it's doing
>
>
> OK, I like the name.
>
>
>> Legacy is a bad word, especially for an independent library. It does
something. It does it now. It's not going to disappear completely. It's not
getting replaced by a new implementation of the same thing. Whether some
other project (Annotator or otherwise) needs it in the past, future, or
present seems totally irrelevant.
>
>
> OK, xpath-range it is.
>
>    Kristof
>
> ps. In case you moved this conversion off-list unintentionally, it's ok
the FWD this back.
>

Okay. It seems you took it off list, though, with the following message, so
I'll quote that here for everyone:

>> On Jun 17, 2014 5:30 AM, "Kristof Csillag" <csillag at hypothes.is> wrote:
>>>
>>> Let me describe my perspective.
>>>
>>> I was not yet at the scene when the current Range implementation was
conceived; based on some vague remarks from some of the people who were
involved, here is what I imagine to have happened.
>>>
>>> (People with actual memories (instead of mere speculation) please step
forward.)
>>>
>>> The Range module was build to solve some of the incompatibilities of
the various browsers, which could not properly work with ranges that start
or end inside text nodes. Hence, they come up with the process of
"Normalization", which means that when we are working with a range that
starts or ends inside a next node, we split the next nodes at the range
borders, so that we end up with a range that always wraps entire text
nodes. This way, the implementation problems of some of the browsers (which
were observed 3+ years ago) can be avoided.
>>>
>>> The problem with this approach is that it actually *changes* the DOM,
instead of just *describe* places in it.
>>>
>>> A typical use case is the following:
>>>
>>>  - We create a BrowserRange object, based on an actual native browser
range object.
>>>  - This BrowserRange object is just a thin wrapper around the native
browser range (which is a description of a segment of the page)
>>>  - We want to know more about he location of the range, so we want to
serialize it, to get the XPath expressions
>>>  - We do this by calling the serialize() method.
>>>  - This does gets us the serialized description of the range, but as a
side effect, it also calls normalize(), which changes the DOM.
>>>
>>> The problem with this is that our intention was only to work with some
data structures (describing a segment of the DOM), but doing
transformations on these data structures has an unintended side effect on
the described object (the DOM) itself.
>>>
>>> I don't think this is desired.
>>>
>>> In many cases, Annotator (and related applications) must work on and
over content which is "foreign", ie. not managed by us.
>>> In those cases, it's just not nice to arbitrary changes pieces of the
DOM, just for our own convenience.
>>> In many cases, it does not matter, but there can be situations where
there is some active code on the page, which is confused by what we are
doing.
>>> (I have seen such examples.)
>>>
>>> I understand that this solved some problems earlier, but (according to
some feedback) those problems (which earlier has prevented us from working
with ranges starting and ending in the middle of text nodes) might have
been solved since then.
>>>
>>> This is why I think that in the long run, we should leave this
normalization approach behind.
>>>
>>> It should be entirely possible to "serialize" a range (ie. calculate
the XPath description of it) without changing anything in the DOM.
>>>
>>> There are some situations when we do need this behavior (for example
when we want to wrap a range inside a SPAN, for highlighting), but
>>>  - even for this kind of functionality, we can find better,
less-invasive alternatives
>>>  - we should not do changes in the cases when we just want to
*describe* a range, and don't intend to do anything about this, but
currently, the two steps are bundled together.
>>>
>>>    * * *
>>>
>>> The next point to make is that this normalization approach is not only
an implementation detail; it has consequences on the API.
>>>
>>> Any code currently working with normalized ranges assumes (correctly)
that the range always starts exactly before the starting node of the range,
and not somewhere inside it. (The goes for the end, too.) This is an
artificial limitation of the original browser range specification, which
does support ranges starting or ending inside text nodes.
>>>
>>> So if we change the behavior of the range implementation, but we keep
the API, we will break the code that uses it.
>>>
>>> This is why I say that we should not promote this range module (neither
the implementation, nor the API) as a recommended building block for future
functionality. That's why I would prefer to name the module we use for
shipping it something indicating so, for example
>>>  - LegacyRange (for obvious reasons) or
>>>  - MagicRange (describing my original sensation when I discovered what
is going on in the bowels of the implementation), or
>>>  - anything else.
>>>
>>> What I wanted to avoid is having a name which would suggest that it's
part of the family of general purpose, modern, future-proof utilities we
are building (like dom-text-mapper, dom-anchor-core, etc) for Annotator and
related tools.
>>>
>>> We might want to build such a library later on, but this is not it; I
just want to preserve the current functionality, and plug it into the new
framework.
>>>
>>> There are some clear possibilities of improvement in this field, but
the goal of my current project is not solving the problems with ranges, but
to repackage the already existing functionality (in upstream Annotator,
Hypothes.is's fork of Annotator, and a few more related projects) into a
generally useful package.
>>>
>>>    * * *
>>>
>>> What to package into the range module?
>>>
>>> I just wanted to create a package which we could use as a dependency
for both the dom-anchor family of libraries and for Annotator (until we can
migrate Annotator to use the dom-anchor library family instead.) That would
simply involve the current implementation into a separate module, without
doing any change or refactoring whatsoever. Naturally, that would involve
the serialization feature, too, which is a core feature of the
implementation. That means that we would move the xpath utility along with
it, but that's fine, because the Range module is the only part actually
using it.
>>>
>>> So xpath, range (and a few methods from util) would go into the new
module.
>>>
>>> Any project (including Annotator) could simply use this module (via npm
and browserify, as usual), and the functionality and API would be exactly
identical to what we have now.
>>>
>>> That's my current approach.
>>>
>>>    Kristof
>>>
>>>
>>>
>>> On 2014-06-17 03:10, Randall Leeds wrote:
>>>>
>>>> On Wed, Jun 11, 2014 at 11:50 AM, Randall Leeds <tilgovi at hypothes.is>
wrote:
>>>>>
>>>>> +1 to releasing Range as a separate module.
>>>>>
>>>>> The best thing around besides ours seems to be Rangy and it
unfortunately seems to be acquiring more stuff, such as a highlighter.
>>>>>
>>>>> A good Range package would be a great thing for the web dev community.
>>>>
>>>>
>>>> I suggested to Kristof this morning that we create a "dom-range" repo
under openannotation.
>>>>
>>>> There was some disagreement [see transcript below] about whether this
is a good name.
>>>> In an attempt to resolve the issue, I'm taking stock of what's in this
module.
>>>>
>>>> Annotator.Range
>>>> =============
>>>>
>>>> This module implements three Range objects: BrowserRange,
SerializedRange, and NormalizedRange
>>>>
>>>> BrowserRange
>>>> ----------------------
>>>>
>>>> This serves to just provide a wrapper around the DOM Range object with
two additional functions.
>>>>
>>>> Its properties are the same as the DOM Range object, though it lacks
properties that aren't used in Annotator.
>>>>
>>>> In addition to these properties, it provides a normalize() and
serialize() call. The serialize() method returns an object fit for
serialization (duh). The normalize() method returns a NormalizedRange
instance after normalizing the range. Normalization involves moving the
start or end of the range based on some rules.
>>>>
>>>> NormalizedRange
>>>> ---------------------------
>>>>
>>>> This object provides different properties than the DOM Range, but
encapsulates the same concept. It also adds a few other methods.
>>>>
>>>> The serialize() method is actually the serialization as described in
BrowserRange. The BrowserRange serialize() method actually calls
normalize() first to get a NormalizedRange and then calls serialize() on
that.
>>>>
>>>> Most of serialization is a simple XPath builder with the added detail
that a relative root can be passed in.
>>>>
>>>> NormalizedRange objects also provide a function to get the text nodes
they contain, to get the string of text contained by those text nodes, and
to get a real DOM Range object.
>>>>
>>>> It's worth noting that NormalizedRange#text() is probably the
equivalent of the DOM Range #toString() method.
>>>>
>>>> A limit() method provides the ability to reduce the range to only the
nodes that fall inside the given container.
>>>>
>>>> SerializedRange
>>>> -------------------------
>>>>
>>>> This serves as an OO wrapper around a serializable Range. It contains
an XPath expression for each of the DOM Range object's start- and
endContainer properties as well a the start and end offsets. Its
normalize() method first attempts to resolve to XPath to find this range in
the document and, having resolved the start- and endContainer nodes,
returns a NormalizedRange.
>>>>
>>>> Analysis
>>>> ======
>>>>
>>>> Very little in this module actually deviates from the DOM Range spec.
>>>>
>>>> - Mostly, we don't implement the methods.
>>>> - Serialization and deserialization could be kept in Annotator itself.
This avoids having to extract our xpath code. We can evaluate that
separately.
>>>> - Normalization and limiting are potentially reasonable proposals for
the Range spec. It makes me wonder whether a normalization algorithm in the
spec would have saved us from a lot of interop woes. Rangy implements a
normalization algorithm, too.
>>>>
>>>> Conclusion
>>>> ========
>>>>
>>>> I would support extracting the Range module from Annotator. However, I
don't see much value beyond normalization and limiting.
>>>>
>>>> The only other library on the radar that we see in use is Rangy.
>>>>
>>>> Rangy:
>>>>
>>>> - Implements Range and Selection in one library.
>>>> - The core weighs ~43KB minified compared with Annotator.Range at ~8KB.
>>>>
>>>> A think a library that just focuses on being a good compatibility
wrapper for Range and contains a cross browser CI suite would be great.
It'd be useful to uncover and document which browsers implement the newest
whatwg range calls, #createContextualFragment(), #getClientRects(), and
#getBoundingClientRect(). I think defining a normalization algorithm,
whether it's a new method or something browsers do implicitly, would be
great for HTML Editing.
>>>>
>>>> Rangy does contain an implementation of
Range#createContextualFragment().
>>>>
>>>> So, I don't think there's much "magic" here:
>>>>
>>>> - The sniff function is only a few lines to detect between the three
Range subclasses
>>>> - Serialization is probably best left out (for instance, Rangy has a
CRC32 based serialization).
>>>> - Normalization is the core contribution that seems to be missing from
the standards.
>>>>
>>>> Mostly we'd gain a place to keep this neat and separate, add tests as
we go, and perhaps plug some browser incompatibilities. We also get a place
to define our normalization algorithm exactly and implement it with tests
so others can use it.
>>>>
>>>> Therefore, I'm in favor of just calling it "dom-range" and making it
implement the spec. Depending on the required browser compatibility, most
of the implementation can be trusted to the browser built-in Range object.
>>>>
>>>>
>>>> [Transcript]
>>>>
>>>> 16:30:11 <csillag1> let's call the new repo MagicRange
>>>> 16:30:19 <tilgovi> No.
>>>> 16:30:20 <tilgovi> I'd rather not.
>>>> 16:30:28 <csillag1> Range sounds way too generic.
>>>> 16:30:34 <csillag1> It's not a useful name to identify a piece of code.
>>>> 16:30:35 <tilgovi> I actually think we should just call it dom-range
>>>> 16:30:43 <csillag1> Please no.
>>>> 16:30:48 <csillag1> It's not something which we want to
>>>> 16:30:48 <tilgovi> For now it can be extracted unchanged
>>>> 16:30:55 <csillag1> promote
>>>> 16:30:56 <Treora> ah that repo, I was looking at the one in hypothe.is
which has only one line of documentation. :)
>>>> 16:31:05 <csillag1> it's legacy code, which has only been created
>>>> 16:31:08 <csillag1> because of historical reasons.
>>>> 16:31:19 <csillag1> Don't name it in a way that indicates
>>>> 16:31:24 <tilgovi> But the best would be if we made a library that
actually just impediments a standard Range with tests and maybe new calls
for serialization
>>>> 16:31:24 <csillag1> that it's something generally useful
>>>> 16:31:29 <csillag1> part of our framework for the future, etc.
>>>> 16:31:31 <tilgovi> It should be generally useful
>>>> 16:31:33 <csillag1> I would actually get rid of it.
>>>> 16:31:58 <csillag1> I was created to work around problems
>>>> 16:32:02 <csillag1> which are probably long gone now.
>>>> 16:32:12 <tilgovi> That's not true or we wouldn't use it.
>>>> 16:32:15 <csillag1> The original research should be done again.
>>>> 16:32:23 <csillag1> We are using because much of our code depends on
it.
>>>> 16:32:30 <tilgovi> And Rangy wouldn't exist.
>>>> 16:32:35 <csillag1> But we don't need to write new code around it.
>>>> 16:32:50 <csillag1> I don't know about the history of Rangy;
>>>> 16:33:02 <csillag1> I just know that the current Range implementation
>>>> 16:33:12 <csillag1> brings a large set of oddities along with it.
>>>> 16:33:17 <csillag1> It does solve other problems,
>>>> 16:33:22 <csillag1> but according to Nick,
>>>> 16:33:23 <tilgovi> What we see from IE issues is not that the MS team
has done something wrong, but that the HTML Editing specs are drafts. They
are vague. They have holes and requests for feedback.
>>>> 16:33:37 <csillag1> those problems are already gone now.
>>>> 16:33:41 <tilgovi> Rangy's stated goal is a spec compliant Range
>>>> 16:33:54 <csillag1> Well ... sounds like a nice goal,
>>>> 16:33:57 <csillag1> but it's not there yet.
>>>> 16:34:01 <csillag1> crashes under IE.
>>>> 16:34:05 <tilgovi> But the specs aren't even really compete
>>>> 16:34:06 <csillag1> Has a huge performance penalty.
>>>> 16:34:08 <tilgovi> Complete
>>>>
>>>>
>>>>>
>>>>> On Jun 11, 2014 8:18 AM, "Kristof Csillag" <csillag at hypothes.is>
wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> As some of you already know, currently I am working on separating all
>>>>>> the anchoring-related work that which Annotator does (both in the
>>>>>> Upstream version, both in the Hypothes.is fork) into a separate
library,
>>>>>> which Annotator and other projects could use, but which can be
developed
>>>>>> independently.
>>>>>>
>>>>>> As a part of this problem, I need to have the current Range
>>>>>> implementation (BrowserRange, NormalizedRange, SerializedRange) in
that
>>>>>> library, too.
>>>>>>
>>>>>> How would you feel like releasing this part (the Range
implementation)
>>>>>> as a separate NPM package, so that it can be plugged in easily
wherever
>>>>>> we need it?
>>>>>>
>>>>>>    Kristof
>>>>>> _______________________________________________
>>>>>> annotator-dev mailing list
>>>>>> annotator-dev at lists.okfn.org
>>>>>> https://lists.okfn.org/mailman/listinfo/annotator-dev
>>>>>> Unsubscribe: https://lists.okfn.org/mailman/options/annotator-dev
>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/annotator-dev/attachments/20140617/9d57921c/attachment-0004.html>