You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Lynn Monson <lm...@flipdog.com> on 2000/10/10 22:58:34 UTC

RE: Suggestions for improving RangeImpl.compareBoundaryPoints

When you have a moment to review the changes, I'd welcome any comments.  I
have several other range fixes and optimizations I'd like to submit; before
doing so, though, it seemed prudent to submit these changes and see if they
seemed reasonable to those on the list.  Thanks so much for reviewing them.

Lynn Monson

RE: Suggestions for improving RangeImpl.compareBoundaryPoints

Posted by Lynn Monson <lm...@flipdog.com>.

I apologize for beating this drum.  I've re-read the spec wording and think
I understand it (now).  Unfortunately, I still think the code is in error.
To see why, take a very simple example:

Text node: abc
Let aRange span the character "a"
let sourceRange span the character "c"

If I invoke aRange.compareBoundaryPoints( START_TO_START, sourceRange ):

Xerces code says:

 if (how == START_TO_START) {
      endPointA = sourceRange.getStartContainer();
       endPointB = fStartContainer;
       offsetA = sourceRange.getStartOffset();
        offsetB = fStartOffset;
 } else

So far, so good.  "A" is sourceRange and "B" is aRange.  They have the same
container, and offsetA==2 and offsetB==0.  The next lines executed are:

   // case 1: same container
   if (endPointA == endPointB) {
       if (offsetA < offsetB) return -1;
       if (offsetA == offsetB) return 0;
       return 1;
   }

Since offsetA > offsetB, a +1 is returned.  But this contradicts the (new)
spec wording which says the result is:

  -1, 0 or 1 depending on whether the corresponding boundary-point of the
  Range is, respectively, before, equal to, or after the corresponding
  boundary-point of sourceRange

In the example, offsetB is the "corresponding boundary-point of the Range"
and offsetA is the "corresponding boundary-point of sourceRange."  Since
offsetB is before A, shouldn't the answer be -1 instead of +1?

RE: Suggestions for improving RangeImpl.compareBoundaryPoints

Posted by Lynn Monson <lm...@flipdog.com>.

Thanks, Arnaud, for researching this.  Just to be clear, there are two
seperate issues:

1) If I invoke aRange.compareBoundaryPoints( START_TO_END, sourceRange ), I
need to decide which range's start point and which range's end point to
consider.  If I understand correctly, the official answer is: start of
"sourceRange" and end of "aRange".

2) After making decision #1, you have to decide which of the two boundary
points is the reference point.  If I understand the language correctly, the
reference point is the "end" of aRange.  (Given the new "respective"
wording).

So... If I have a text node, say "abcde", with aRange spanning "abcde" and
sourceRange spanning "bc", then:

aRange.compareBoundaryPoints( START_TO_END, sourceRange ) == +1

Because we are comparing the end of aRange to the start of sourceRange.

FWIW... The proposed wording change addressed point #2 above, but does not
seem to clarify point #1.

Thanks again
Lynn Monson

Re: Suggestions for improving RangeImpl.compareBoundaryPoints

Posted by Arnaud Le Hors <le...@us.ibm.com>.

Lynn Monson wrote:
> 
> > I can see the ambiguity. The spec could use some clarification. But
> > I'm not sure the original Xerces code is wrong though. What makes you
> > believe so?
> 
> I don't have hard evidence that the Xerces interpretation is wrong.  I was
> basing my conclusion on three factors:
> 
> * At previous employment, I used to follow the DOM working group
> discussions.  My foggy memory recalled that prior to compareBoundaryPoints
> existing as a method, there were previous range comparison methods that
> compared the arguments the other way around.
> 
> * Having "this" be the reference point for comparison against the argument
> seemed more consistent with other OO frameworks I've worked with.  Thus,
> "less than" means that "this" is less than the argument.  For example,
> Java's Comparable interface works that way.
> 
> * Outside of the Xerces code, I couldn't find countervailing evidence that
> suggested an opposite interpretation.  Although there may be some and I just
> missed it.  Perhaps the argument name, "sourceRange", is meant to be the tip
> off.
> 
> Either interpretation is fine with me.  It just seemed, on balance, that an
> opposite interpretation from the Xerces code was more likely.  If you can
> clarify it, I would greatly appreciate it.

I forwarded the issue to the W3C DOM Working Group which confirmed that
Xerces current implementation matches the spec. It was acknowledged that
having "sourceRange" be the reference as opposed to "this" was counter
intuitive and the other way around would have been better but it is too
late to change the spec (it's meant to become a Recommendation very very
soon now, the Proposed Recommendation review period has ended).
To try and clarify the spec it was decided to had "respectively" to the
description of the value returned by compareBoundaryPoints:

-1, 0 or 1 depending on whether the corresponding boundary-point of the
Range is, respectively, before, equal to, or after the corresponding
boundary-point of sourceRange
-- 
Arnaud  Le Hors - IBM Cupertino, XML Technology Group

RE: Suggestions for improving RangeImpl.compareBoundaryPoints

Posted by Lynn Monson <lm...@flipdog.com>.

> I can see the ambiguity. The spec could use some clarification. But
> I'm not sure the original Xerces code is wrong though. What makes you
> believe so?

I don't have hard evidence that the Xerces interpretation is wrong.  I was
basing my conclusion on three factors:

* At previous employment, I used to follow the DOM working group
discussions.  My foggy memory recalled that prior to compareBoundaryPoints
existing as a method, there were previous range comparison methods that
compared the arguments the other way around.

* Having "this" be the reference point for comparison against the argument
seemed more consistent with other OO frameworks I've worked with.  Thus,
"less than" means that "this" is less than the argument.  For example,
Java's Comparable interface works that way.

* Outside of the Xerces code, I couldn't find countervailing evidence that
suggested an opposite interpretation.  Although there may be some and I just
missed it.  Perhaps the argument name, "sourceRange", is meant to be the tip
off.

Either interpretation is fine with me.  It just seemed, on balance, that an
opposite interpretation from the Xerces code was more likely.  If you can
clarify it, I would greatly appreciate it.

Lynn

RE: Suggestions for improving RangeImpl.compareBoundaryPoints

Posted by Lynn Monson <lm...@flipdog.com>.

Hi Arnaud,

I was looking through the 1.2.1 Xerces release and noticed that only part of
the optimization made it into the code.  Specifically, the "case 2:" and
"case 3:" conditions still use a repetitive tree walking algorithm rather
than a one-pass ancestor test.  I was just wondering if there was an
objection to those optimizations or if it was just an oversight.  Thanks for
any feedback.

Lynn Monson

Re: Suggestions for improving RangeImpl.compareBoundaryPoints

Posted by Arnaud Le Hors <le...@us.ibm.com>.

Hi Lynn,
I finally managed to take some time to review your patch. Your patch
includes two changes. I have no problem with the second one (the
optimization). However, I'm not sure about the first one:

>         // Note: There is ambiguity in the specification about which
>         // boundary points we are comparing.  Is START_TO_END, for example,
>         // comparing the start of "this" to the end of "sourceRange" or
>         // the other way around?  The original Xerces code took the latter
>         // interpretation, but we believe that to be incorrect.

I can see the ambiguity. The spec could use some clarification. But I'm
not sure the original Xerces code is wrong though. What makes you
believe so?
It's a matter of which of the two ranges you take as the reference,
right? Based on the spec at
http://www.w3.org/TR/DOM-Level-2-Traversal-Range/ranges.html#Level-2-Range-Comparing
-----
The return value is -1, 0 or 1 depending on whether the corresponding
boundary-point of the Range is before, equal to, or after the
corresponding boundary-point of sourceRange.
-----

I'd say the sourceRange is the reference and "this" is tested against
the sourceRange, which I believe is what Xerces does.
What do you think? I'll be happy to raise the issue to the DOM Working
Group for clarification.
-- 
Arnaud  Le Hors - IBM Cupertino, XML Technology Group