You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Tim Ellison <t....@gmail.com> on 2006/10/10 02:50:06 UTC

[classlib][regex|luni] build break

So I checked in a patch for HARMONY-688's regex fix, and it passed the
regex unit tests, but causes the existing luni tests to fail in
java.util.Scanner.  I've not figured out the base cause of the failure
so I've backed out the changes.

Regards,
Tim

-- 

Tim Ellison (t.p.ellison@gmail.com)
IBM Java technology centre, UK.

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][regex|luni] build break

Posted by Spark Shen <sm...@gmail.com>.
Spark Shen 写道:
> Tim Ellison 写道:
>> So I checked in a patch for HARMONY-688's regex fix, and it passed the
>> regex unit tests, but causes the existing luni tests to fail in
>> java.util.Scanner.  I've not figured out the base cause of the failure
>> so I've backed out the changes.
>>
>> Regards,
>> Tim
>>
>>   
> Hi regular expression guys:
>
> After applying Harmony-688 on my local env. The following test case 
> will fail on Harmony but pass on RI
> public void test_misc() {
> String pattern = "*(\\p{javaDigit})++*";
I need to clarify here:  the pattern is (\\p{javaDigit})++ ,but not 
*(\\p{javaDigit})++*. I change the pattern into bold style, but when 
sent out, they become two asterisks.
> Matcher mat = Pattern.compile(pattern).matcher("123");
> assertTrue(mat.matches());
> }
>
> Since in Scanner implementation, the pattern "(\\p{javaDigit})++" is 
> heavily utilized to recognize integer numbers, then many of Scanner 
> tests fail.
> Greedy quantifiler here works fine.
But Greedy quantifiler is semantically insufficient, do need to use 
double plus signs.
>
> Would any regular expression guru look at this issue?
>
> Best regards
>


-- 
Spark Shen
China Software Development Lab, IBM


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][regex|luni] build break

Posted by Spark Shen <sm...@gmail.com>.
Tim Ellison 写道:
> So I checked in a patch for HARMONY-688's regex fix, and it passed the
> regex unit tests, but causes the existing luni tests to fail in
> java.util.Scanner.  I've not figured out the base cause of the failure
> so I've backed out the changes.
>
> Regards,
> Tim
>
>   
Hi regular expression guys:

After applying Harmony-688 on my local env. The following test case will 
fail on Harmony but pass on RI
public void test_misc() {
String pattern = "*(\\p{javaDigit})++*";
Matcher mat = Pattern.compile(pattern).matcher("123");
assertTrue(mat.matches());
}

Since in Scanner implementation, the pattern "(\\p{javaDigit})++" is 
heavily utilized to recognize integer numbers, then many of Scanner 
tests fail.
Greedy quantifiler here works fine.

Would any regular expression guru look at this issue?

Best regards

-- 
Spark Shen
China Software Development Lab, IBM


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][regex|luni] build break

Posted by Anton Ivanov <an...@gmail.com>.
Patch for the issue
http://issues.apache.org/jira/browse/HARMONY-688
is corrected. I also put to the issue an updated version of the patch for
unit tests (since
there were new commits to java.util.regex unit tests and old patch for unit
tests is out of date).
One can try to apply this patch once again.

Thanks,
Anton



On 10/10/06, Tim Ellison <t....@gmail.com> wrote:
>
> So I checked in a patch for HARMONY-688's regex fix, and it passed the
> regex unit tests, but causes the existing luni tests to fail in
> java.util.Scanner.  I've not figured out the base cause of the failure
> so I've backed out the changes.
>
> Regards,
> Tim
>
> --
>
> Tim Ellison (t.p.ellison@gmail.com)
> IBM Java technology centre, UK.
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

Re: [classlib][regex|luni] build break

Posted by Spark Shen <sm...@gmail.com>.
Spark Shen 写道:
> Tim Ellison 写道:
>> So I checked in a patch for HARMONY-688's regex fix, and it passed the
>> regex unit tests, but causes the existing luni tests to fail in
>> java.util.Scanner.  I've not figured out the base cause of the failure
>> so I've backed out the changes.
>>
>> Regards,
>> Tim
>>
>>   
> Hi
> I will look into the issue.
>
> Best regards
>
Hi All:
After applying Harmony-688 on my local env at r454586,
I saw several test cases failure on RI 1.5.08. They are:
testCanonEqFlag
testIndexesCanonicalEq
testCanonEqFlagWithSupplementaryCharacters
testPredefinedClassesWithSurrogatesSupplementary

All the failed assertions are about Supplementary Characters. Are they 
bugs of RI or harmony?

And I am still investigating what's the problem with j.u.Scanner.

Best regards

-- 
Spark Shen
China Software Development Lab, IBM


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][regex|luni] build break

Posted by Spark Shen <sm...@gmail.com>.
Tim Ellison 写道:
> So I checked in a patch for HARMONY-688's regex fix, and it passed the
> regex unit tests, but causes the existing luni tests to fail in
> java.util.Scanner.  I've not figured out the base cause of the failure
> so I've backed out the changes.
>
> Regards,
> Tim
>
>   
Hi
I will look into the issue.

Best regards

-- 
Spark Shen
China Software Development Lab, IBM


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][regex|luni] build break

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
test package + 1
test package + 1
test package + 1
test package + 1
test package + 1
test package + 1
test package + 1

(no, I'm not volunteering)

Tim Ellison wrote:
> So I checked in a patch for HARMONY-688's regex fix, and it passed the
> regex unit tests, but causes the existing luni tests to fail in
> java.util.Scanner.  I've not figured out the base cause of the failure
> so I've backed out the changes.
> 
> Regards,
> Tim
> 

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][regex|luni] build break

Posted by Spark Shen <sm...@gmail.com>.
Anton Ivanov 写道:
> Hi Spark.
>
> I think there is only a one problem :)
>
> Namely one unit test from PatternTest simply use System.out.println().
> As the result of this problem you can see the output with all these
> java.util.regex.PatternSyntaxExceptions.
> This test output was left in PatternTest by mistake.
> I agree that such tests can cause misunderstanding what tests fail and 
> what
> do not fail.
> So I removed this debug output. I attached an updated patch
> for unit tests with this small fix to the Unicode suppl characters issue.
Oh, I see. That's better. Thanks

Best regards
>
>
> Thanks,
> Anton
> On 10/13/06, Spark Shen <sm...@gmail.com> wrote:
>>
>> Hi Anton:
>>
>> There are still two problems here:
>>
>> 1. Error message printed out on Harmony in console:
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
>> b)a
>> ^
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
>> bcde)a
>> ^
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
>> bbg())a
>> ^
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
>> cdb(?i))a
>>
>> 2. Some test cases in PatternTest simply uses System.out.println()
>> instead of assertion, this way failed test cases
>> can not be easily find out using JUnit output
>>
>> Best regards
>>
>> Anton Ivanov 写道:
>> > I documented the details on both JIRA issues:
>> > http://issues.apache.org/jira/browse/HARMONY-688
>> > http://issues.apache.org/jira/browse/HARMONY-933
>> > So, please mark these issues as non-bug-differences if needed.
>> >
>> > Thanks,
>> > Anton
>> >
>> > On 10/12/06, Paulex Yang <pa...@gmail.com> wrote:
>> >>
>> >> Anton Ivanov wrote:
>> >> > The problem is in the RI. These failures are the RI bugs.
>> >> >
>> >> > The test failures on the RI you pointed out can be grouped into the
>> >> two
>> >> I guess you meant three ;-)
>> >> > categories:
>> >> Is category2, the supplemental character issue, included in the
>> >> HARMONY-933? How about to document the details like below on that 
>> JIRA,
>> >> and mark it as non-bug difference?
>> >> >
>> >> > 1. Canonical equivalence related.
>> >> >
>> >> > java.util.regex.PatternSyntaxException: Unclosed group near 
>> index 59
>> >> > (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
>> >> > ^
>> >> > at java.util.regex.Pattern.error(Pattern.java:1650)
>> >> > at java.util.regex.Pattern.accept(Pattern.java:1508)
>> >> > at java.util.regex.Pattern.group0(Pattern.java:2460)
>> >> > at java.util.regex.Pattern.sequence(Pattern.java:1715)
>> >> > at java.util.regex.Pattern.expr(Pattern.java:1687)
>> >> > at java.util.regex.Pattern.compile(Pattern.java:1397)
>> >> > at java.util.regex.Pattern.<init>(Pattern.java:1124)
>> >> > at java.util.regex.Pattern.compile(Pattern.java:840)
>> >> > at
>> >> > 
>> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
>> >> > PatternTest.java:1060)
>> >> >
>> >> > The RI fails to compile the following pattern with CANON_EQ flag
>> >> > specified:
>> >> >       "\u01E0\u00CCcdb(ac)"
>> >> > This is due to the RI tries to build alternations to take into
>> account
>> >> > canonical equivalence.
>> >> > And the RI does so in simple cases. But if pattern is a little more
>> >> > complex the RI fails to compile it.
>> >> > So the RI builds these alternations wrong.
>> >> > You can see the following bug:
>> >> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4867170
>> >> >
>> >> > I wrote about these test failures on the RI here:
>> >> > http://issues.apache.org/jira/browse/HARMONY-933
>> >> >
>> >> > 2. Supplementary Unicode codepoints related.
>> >> >
>> >> > For example let's see at:
>> >> >
>> >> > testPredefinedClassesWithSurrogatesSupplementary
>> >> > junit.framework.AssertionFailedError: null
>> >> > at junit.framework.Assert.fail(Assert.java:47)
>> >> > at junit.framework.Assert.assertTrue(Assert.java:20)
>> >> > at junit.framework.Assert.assertFalse(Assert.java:34)
>> >> > at junit.framework.Assert.assertFalse(Assert.java:41)
>> >> > at
>> >> >
>> >>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary 
>>
>> >>
>> >> >
>> >> > (PatternTest.java:1477)
>> >> >
>> >> > Here we try to find surrogate character in a codepoint 
>> \uD916\uDE27.
>> >> > It is written here:
>> >> > http://www.unicode.org/reports/tr18/#Supplementary_Characters
>> >> >
>> >> > "Surrogate pairs (or their equivalents in other encoding forms) are
>> be
>> >> > handled internally as single code point values"
>> >> >
>> >> > So we have to treat text as code points not code units.
>> >> > Here \uD916\uDE27 is a one code point consisting of
>> >> > two code units (two surrogate characters) so we find nothing.
>> >> > (I added a comment with this explanation to the
>> >> > testPredefinedClassesWithSurrogatesSupplementary()).
>> >> > But the RI doesn't treat this codepoint as a single whole, this is
>> the
>> >> RI
>> >> > bug
>> >> > and this is wrong according to the technical report.
>> >> >
>> >> > 3. Error messages
>> >> > java.util.regex.PatternSyntaxException: unmatched ) near index: 1
>> >> > b)a
>> >> > ^
>> >> > java.util.regex.PatternSyntaxException: unmatched ) near index: 4
>> >> > bcde)a
>> >> > ^
>> >> > java.util.regex.PatternSyntaxException: unmatched ) near index: 5
>> >> > bbg())a
>> >> > ^
>> >> > java.util.regex.PatternSyntaxException: unmatched ) near index: 7
>> >> > cdb(?i))a
>> >> > ^
>> >> > are printed in the testCompileStringint().
>> >> > This test is needed to verify that appropriate exceptions are 
>> thrown
>> >> > if we compile a wrong builded regular expression.
>> >> >
>> >> > Thanks,
>> >> > Anton
>> >> >
>> >> > On 10/12/06, Spark Shen <sm...@gmail.com> wrote:
>> >> >>
>> >> >> Anton Ivanov 写道:
>> >> >> > On 10/10/06, Anton Ivanov <an...@gmail.com> wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On 10/10/06, Tim Ellison <t....@gmail.com> wrote:
>> >> >> >> >
>> >> >> >> > So I checked in a patch for HARMONY-688's regex fix, and it
>> >> passed
>> >> >> the
>> >> >> >> > regex unit tests, but causes the existing luni tests to 
>> fail in
>> >> >> >> > java.util.Scanner. I've not figured out the base cause of the
>> >> >> failure
>> >> >> >> > so I've backed out the changes.
>> >> >> >> >
>> >> >> >> > Regards,
>> >> >> >> > Tim
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> >
>> >> >> >> > Tim Ellison (t.p.ellison@gmail.com )
>> >> >> >> > IBM Java technology centre, UK.
>> >> >> >> >
>> >> >> >> >
>> >> >>
>> ---------------------------------------------------------------------
>> >> >> >> > Terms of use : 
>> http://incubator.apache.org/harmony/mailing.html
>> >> >> >> > To unsubscribe, e-mail:
>> >> >> harmony-dev-unsubscribe@incubator.apache.org
>> >> >> >> > For additional commands, e-mail:
>> >> >> harmony-dev-help@incubator.apache.org
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> This is my patch.
>> >> >> >> I'll look into this problem and try to correct the patch.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Anton
>> >> >> >>
>> >> >> > There was a bug in the newly created class SupplRangeSet.java.
>> >> >> > There was the following code in the method matches() of
>> >> >> > SupplRangeSet.java:
>> >> >> > ...
>> >> >> > if (stringIndex < strLength) {
>> >> >> > char high = testString.charAt(stringIndex++);
>> >> >> >
>> >> >> > if (contains(high) &&
>> >> >> > next.matches(stringIndex, testString, matchResult) > 0)
>> >> >> > {
>> >> >> > return 1;
>> >> >> > }
>> >> >> > ...
>> >> >> > But it is wrong simply to return 1, though we can read about
>> method
>> >> >> > matches() in AbstractSet.java comments:
>> >> >> >
>> >> >> > "Checks if this node matches in given position and recursively
>> call
>> >> >> > next node matches on positive self match. Returns positive
>> >> integer if
>> >> >> > entire match succeed, negative otherwise
>> >> >> > return -1 if match fails or n > 0;"
>> >> >> > In fact method matches() returns not only a positive n > 0. 
>> The n
>> >> >> is an
>> >> >> > offset in case of a positive
>> >> >> > match attempt. This fact is took into account in all old classes
>> of
>> >> >> > java.util.regex, but I forgot this fact in SupplRangeSet.java
>> >> >> > So I corrected method matches() of the SupplRangeSet class as
>> >> follows:
>> >> >> > ...
>> >> >> > int offset = -1;
>> >> >> > if (stringIndex < strLength) {
>> >> >> > char high = testString.charAt(stringIndex++);
>> >> >> >
>> >> >> > if (contains(high) &&
>> >> >> > (offset = next.matches(stringIndex, testString,
>> >> >> > matchResult)) > 0) {
>> >> >> > return offset;
>> >> >> > }
>> >> >> > ...
>> >> >> > I corrected the patch and attached it to the issue.
>> >> >> > I verified that regex and luni tests pass normally with the 
>> patch
>> >> >> > applied.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Anton
>> >> >> >
>> >> >> Hi Anton:
>> >> >> It must be very excited to handle such a complex problem. :-)
>> >> >>
>> >> >> But after applying the new patch (and test patch applied), I still
>> >> got
>> >> >> problems:
>> >> >> Of test class:
>> >> org.apache.harmony.tests.java.util.regex.PatternTest, 4
>> >> >> test methods fail on RI:
>> >> >> testCanonEqFlag:
>> >> >> java.util.regex.PatternSyntaxException: Unclosed group near 
>> index 59
>> >> >> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
>> >> >> ^
>> >> >> at java.util.regex.Pattern.error(Pattern.java:1650)
>> >> >> at java.util.regex.Pattern.accept(Pattern.java:1508)
>> >> >> at java.util.regex.Pattern.group0(Pattern.java:2460)
>> >> >> at java.util.regex.Pattern.sequence(Pattern.java:1715)
>> >> >> at java.util.regex.Pattern.expr(Pattern.java:1687)
>> >> >> at java.util.regex.Pattern.compile(Pattern.java:1397)
>> >> >> at java.util.regex.Pattern.<init>(Pattern.java:1124)
>> >> >> at java.util.regex.Pattern.compile(Pattern.java:840)
>> >> >> at
>> >> >> 
>> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag
>> (
>> >> >> PatternTest.java:1060)
>> >> >>
>> >> >> testIndexesCanonicalEq:
>> >> >> junit.framework.AssertionFailedError: null
>> >> >> at junit.framework.Assert.fail(Assert.java:47)
>> >> >> at junit.framework.Assert.assertTrue(Assert.java:20)
>> >> >> at junit.framework.Assert.assertTrue(Assert.java:27)
>> >> >> at
>> >> >>
>> >> >>
>> >>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testIndexesCanonicalEq 
>>
>> >>
>> >> >>
>> >> >> (PatternTest.java:1247)
>> >> >>
>> >> >> testCanonEqFlagWithSupplementaryCharacters:
>> >> >> junit.framework.AssertionFailedError: null
>> >> >> at junit.framework.Assert.fail(Assert.java:47)
>> >> >> at junit.framework.Assert.assertTrue(Assert.java:20)
>> >> >> at junit.framework.Assert.assertTrue(Assert.java:27)
>> >> >> at
>> >> >>
>> >> >>
>> >>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlagWithSupplementaryCharacters 
>>
>> >>
>> >> >>
>> >> >> (PatternTest.java:1275)
>> >> >>
>> >> >> testPredefinedClassesWithSurrogatesSupplementary
>> >> >> junit.framework.AssertionFailedError: null
>> >> >> at junit.framework.Assert.fail(Assert.java:47)
>> >> >> at junit.framework.Assert.assertTrue(Assert.java:20)
>> >> >> at junit.framework.Assert.assertFalse(Assert.java:34)
>> >> >> at junit.framework.Assert.assertFalse(Assert.java:41)
>> >> >> at
>> >> >>
>> >> >>
>> >>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary 
>>
>> >>
>> >> >>
>> >> >> (PatternTest.java:1477)
>> >> >> If they are the bugs of RI, shall we add comments for them in the
>> >> test
>> >> >> case?
>> >> >>
>> >> >> and Error message printed out on console on Harmony. Since 
>> there are
>> >> >> test cases use System.out instead of assert, I could not locate
>> where
>> >> >> these error message comes from:
>> >> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
>> >> >> b)a
>> >> >> ^
>> >> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
>> >> >> bcde)a
>> >> >> ^
>> >> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
>> >> >> bbg())a
>> >> >> ^
>> >> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
>> >> >> cdb(?i))a
>> >> >> ^
>> >> >> And last, the good news is luni tests do pass. :-)
>> >> >>
>> >> >> Best regards
>> >> >>
>> >> >> --
>> >> >> Spark Shen
>> >> >> China Software Development Lab, IBM
>> >> >>
>> >> >>
>> >> >>
>> ---------------------------------------------------------------------
>> >> >> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> >> >> To unsubscribe, e-mail: 
>> harmony-dev-unsubscribe@incubator.apache.org
>> >> >> For additional commands, e-mail:
>> >> harmony-dev-help@incubator.apache.org
>> >> >>
>> >> >>
>> >>
>> >>
>> >> --
>> >> Paulex Yang
>> >> China Software Development Lab
>> >> IBM
>> >>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> >> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> >> For additional commands, e-mail: 
>> harmony-dev-help@incubator.apache.org
>> >>
>> >>
>>
>>
>> -- 
>> Spark Shen
>> China Software Development Lab, IBM
>>
>>
>> ---------------------------------------------------------------------
>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>
>>


-- 
Spark Shen
China Software Development Lab, IBM


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][regex|luni] build break

Posted by Anton Ivanov <an...@gmail.com>.
Hi Spark.

I think there is only a one problem :)

Namely one unit test from PatternTest simply use System.out.println().
As the result of this problem you can see the output with all these
java.util.regex.PatternSyntaxExceptions.
This test output was left in PatternTest by mistake.
I agree that such tests can cause misunderstanding what tests fail and what
do not fail.
So I removed this debug output. I attached an updated patch
for unit tests with this small fix to the Unicode suppl characters issue.


Thanks,
Anton
On 10/13/06, Spark Shen <sm...@gmail.com> wrote:
>
> Hi Anton:
>
> There are still two problems here:
>
> 1. Error message printed out on Harmony in console:
> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
> b)a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
> bcde)a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
> bbg())a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
> cdb(?i))a
>
> 2. Some test cases in PatternTest simply uses System.out.println()
> instead of assertion, this way failed test cases
> can not be easily find out using JUnit output
>
> Best regards
>
> Anton Ivanov 写道:
> > I documented the details on both JIRA issues:
> > http://issues.apache.org/jira/browse/HARMONY-688
> > http://issues.apache.org/jira/browse/HARMONY-933
> > So, please mark these issues as non-bug-differences if needed.
> >
> > Thanks,
> > Anton
> >
> > On 10/12/06, Paulex Yang <pa...@gmail.com> wrote:
> >>
> >> Anton Ivanov wrote:
> >> > The problem is in the RI. These failures are the RI bugs.
> >> >
> >> > The test failures on the RI you pointed out can be grouped into the
> >> two
> >> I guess you meant three ;-)
> >> > categories:
> >> Is category2, the supplemental character issue, included in the
> >> HARMONY-933? How about to document the details like below on that JIRA,
> >> and mark it as non-bug difference?
> >> >
> >> > 1. Canonical equivalence related.
> >> >
> >> > java.util.regex.PatternSyntaxException: Unclosed group near index 59
> >> > (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
> >> > ^
> >> > at java.util.regex.Pattern.error(Pattern.java:1650)
> >> > at java.util.regex.Pattern.accept(Pattern.java:1508)
> >> > at java.util.regex.Pattern.group0(Pattern.java:2460)
> >> > at java.util.regex.Pattern.sequence(Pattern.java:1715)
> >> > at java.util.regex.Pattern.expr(Pattern.java:1687)
> >> > at java.util.regex.Pattern.compile(Pattern.java:1397)
> >> > at java.util.regex.Pattern.<init>(Pattern.java:1124)
> >> > at java.util.regex.Pattern.compile(Pattern.java:840)
> >> > at
> >> > org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
> >> > PatternTest.java:1060)
> >> >
> >> > The RI fails to compile the following pattern with CANON_EQ flag
> >> > specified:
> >> >       "\u01E0\u00CCcdb(ac)"
> >> > This is due to the RI tries to build alternations to take into
> account
> >> > canonical equivalence.
> >> > And the RI does so in simple cases. But if pattern is a little more
> >> > complex the RI fails to compile it.
> >> > So the RI builds these alternations wrong.
> >> > You can see the following bug:
> >> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4867170
> >> >
> >> > I wrote about these test failures on the RI here:
> >> > http://issues.apache.org/jira/browse/HARMONY-933
> >> >
> >> > 2. Supplementary Unicode codepoints related.
> >> >
> >> > For example let's see at:
> >> >
> >> > testPredefinedClassesWithSurrogatesSupplementary
> >> > junit.framework.AssertionFailedError: null
> >> > at junit.framework.Assert.fail(Assert.java:47)
> >> > at junit.framework.Assert.assertTrue(Assert.java:20)
> >> > at junit.framework.Assert.assertFalse(Assert.java:34)
> >> > at junit.framework.Assert.assertFalse(Assert.java:41)
> >> > at
> >> >
> >>
> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary
> >>
> >> >
> >> > (PatternTest.java:1477)
> >> >
> >> > Here we try to find surrogate character in a codepoint \uD916\uDE27.
> >> > It is written here:
> >> > http://www.unicode.org/reports/tr18/#Supplementary_Characters
> >> >
> >> > "Surrogate pairs (or their equivalents in other encoding forms) are
> be
> >> > handled internally as single code point values"
> >> >
> >> > So we have to treat text as code points not code units.
> >> > Here \uD916\uDE27 is a one code point consisting of
> >> > two code units (two surrogate characters) so we find nothing.
> >> > (I added a comment with this explanation to the
> >> > testPredefinedClassesWithSurrogatesSupplementary()).
> >> > But the RI doesn't treat this codepoint as a single whole, this is
> the
> >> RI
> >> > bug
> >> > and this is wrong according to the technical report.
> >> >
> >> > 3. Error messages
> >> > java.util.regex.PatternSyntaxException: unmatched ) near index: 1
> >> > b)a
> >> > ^
> >> > java.util.regex.PatternSyntaxException: unmatched ) near index: 4
> >> > bcde)a
> >> > ^
> >> > java.util.regex.PatternSyntaxException: unmatched ) near index: 5
> >> > bbg())a
> >> > ^
> >> > java.util.regex.PatternSyntaxException: unmatched ) near index: 7
> >> > cdb(?i))a
> >> > ^
> >> > are printed in the testCompileStringint().
> >> > This test is needed to verify that appropriate exceptions are thrown
> >> > if we compile a wrong builded regular expression.
> >> >
> >> > Thanks,
> >> > Anton
> >> >
> >> > On 10/12/06, Spark Shen <sm...@gmail.com> wrote:
> >> >>
> >> >> Anton Ivanov 写道:
> >> >> > On 10/10/06, Anton Ivanov <an...@gmail.com> wrote:
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On 10/10/06, Tim Ellison <t....@gmail.com> wrote:
> >> >> >> >
> >> >> >> > So I checked in a patch for HARMONY-688's regex fix, and it
> >> passed
> >> >> the
> >> >> >> > regex unit tests, but causes the existing luni tests to fail in
> >> >> >> > java.util.Scanner. I've not figured out the base cause of the
> >> >> failure
> >> >> >> > so I've backed out the changes.
> >> >> >> >
> >> >> >> > Regards,
> >> >> >> > Tim
> >> >> >> >
> >> >> >> > --
> >> >> >> >
> >> >> >> > Tim Ellison (t.p.ellison@gmail.com )
> >> >> >> > IBM Java technology centre, UK.
> >> >> >> >
> >> >> >> >
> >> >>
> ---------------------------------------------------------------------
> >> >> >> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> >> >> >> > To unsubscribe, e-mail:
> >> >> harmony-dev-unsubscribe@incubator.apache.org
> >> >> >> > For additional commands, e-mail:
> >> >> harmony-dev-help@incubator.apache.org
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> This is my patch.
> >> >> >> I'll look into this problem and try to correct the patch.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Anton
> >> >> >>
> >> >> > There was a bug in the newly created class SupplRangeSet.java.
> >> >> > There was the following code in the method matches() of
> >> >> > SupplRangeSet.java:
> >> >> > ...
> >> >> > if (stringIndex < strLength) {
> >> >> > char high = testString.charAt(stringIndex++);
> >> >> >
> >> >> > if (contains(high) &&
> >> >> > next.matches(stringIndex, testString, matchResult) > 0)
> >> >> > {
> >> >> > return 1;
> >> >> > }
> >> >> > ...
> >> >> > But it is wrong simply to return 1, though we can read about
> method
> >> >> > matches() in AbstractSet.java comments:
> >> >> >
> >> >> > "Checks if this node matches in given position and recursively
> call
> >> >> > next node matches on positive self match. Returns positive
> >> integer if
> >> >> > entire match succeed, negative otherwise
> >> >> > return -1 if match fails or n > 0;"
> >> >> > In fact method matches() returns not only a positive n > 0. The n
> >> >> is an
> >> >> > offset in case of a positive
> >> >> > match attempt. This fact is took into account in all old classes
> of
> >> >> > java.util.regex, but I forgot this fact in SupplRangeSet.java
> >> >> > So I corrected method matches() of the SupplRangeSet class as
> >> follows:
> >> >> > ...
> >> >> > int offset = -1;
> >> >> > if (stringIndex < strLength) {
> >> >> > char high = testString.charAt(stringIndex++);
> >> >> >
> >> >> > if (contains(high) &&
> >> >> > (offset = next.matches(stringIndex, testString,
> >> >> > matchResult)) > 0) {
> >> >> > return offset;
> >> >> > }
> >> >> > ...
> >> >> > I corrected the patch and attached it to the issue.
> >> >> > I verified that regex and luni tests pass normally with the patch
> >> >> > applied.
> >> >> >
> >> >> > Thanks,
> >> >> > Anton
> >> >> >
> >> >> Hi Anton:
> >> >> It must be very excited to handle such a complex problem. :-)
> >> >>
> >> >> But after applying the new patch (and test patch applied), I still
> >> got
> >> >> problems:
> >> >> Of test class:
> >> org.apache.harmony.tests.java.util.regex.PatternTest, 4
> >> >> test methods fail on RI:
> >> >> testCanonEqFlag:
> >> >> java.util.regex.PatternSyntaxException: Unclosed group near index 59
> >> >> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
> >> >> ^
> >> >> at java.util.regex.Pattern.error(Pattern.java:1650)
> >> >> at java.util.regex.Pattern.accept(Pattern.java:1508)
> >> >> at java.util.regex.Pattern.group0(Pattern.java:2460)
> >> >> at java.util.regex.Pattern.sequence(Pattern.java:1715)
> >> >> at java.util.regex.Pattern.expr(Pattern.java:1687)
> >> >> at java.util.regex.Pattern.compile(Pattern.java:1397)
> >> >> at java.util.regex.Pattern.<init>(Pattern.java:1124)
> >> >> at java.util.regex.Pattern.compile(Pattern.java:840)
> >> >> at
> >> >> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag
> (
> >> >> PatternTest.java:1060)
> >> >>
> >> >> testIndexesCanonicalEq:
> >> >> junit.framework.AssertionFailedError: null
> >> >> at junit.framework.Assert.fail(Assert.java:47)
> >> >> at junit.framework.Assert.assertTrue(Assert.java:20)
> >> >> at junit.framework.Assert.assertTrue(Assert.java:27)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.harmony.tests.java.util.regex.PatternTest.testIndexesCanonicalEq
> >>
> >> >>
> >> >> (PatternTest.java:1247)
> >> >>
> >> >> testCanonEqFlagWithSupplementaryCharacters:
> >> >> junit.framework.AssertionFailedError: null
> >> >> at junit.framework.Assert.fail(Assert.java:47)
> >> >> at junit.framework.Assert.assertTrue(Assert.java:20)
> >> >> at junit.framework.Assert.assertTrue(Assert.java:27)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlagWithSupplementaryCharacters
> >>
> >> >>
> >> >> (PatternTest.java:1275)
> >> >>
> >> >> testPredefinedClassesWithSurrogatesSupplementary
> >> >> junit.framework.AssertionFailedError: null
> >> >> at junit.framework.Assert.fail(Assert.java:47)
> >> >> at junit.framework.Assert.assertTrue(Assert.java:20)
> >> >> at junit.framework.Assert.assertFalse(Assert.java:34)
> >> >> at junit.framework.Assert.assertFalse(Assert.java:41)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary
> >>
> >> >>
> >> >> (PatternTest.java:1477)
> >> >> If they are the bugs of RI, shall we add comments for them in the
> >> test
> >> >> case?
> >> >>
> >> >> and Error message printed out on console on Harmony. Since there are
> >> >> test cases use System.out instead of assert, I could not locate
> where
> >> >> these error message comes from:
> >> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
> >> >> b)a
> >> >> ^
> >> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
> >> >> bcde)a
> >> >> ^
> >> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
> >> >> bbg())a
> >> >> ^
> >> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
> >> >> cdb(?i))a
> >> >> ^
> >> >> And last, the good news is luni tests do pass. :-)
> >> >>
> >> >> Best regards
> >> >>
> >> >> --
> >> >> Spark Shen
> >> >> China Software Development Lab, IBM
> >> >>
> >> >>
> >> >>
> ---------------------------------------------------------------------
> >> >> Terms of use : http://incubator.apache.org/harmony/mailing.html
> >> >> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> >> >> For additional commands, e-mail:
> >> harmony-dev-help@incubator.apache.org
> >> >>
> >> >>
> >>
> >>
> >> --
> >> Paulex Yang
> >> China Software Development Lab
> >> IBM
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> Terms of use : http://incubator.apache.org/harmony/mailing.html
> >> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >>
> >>
>
>
> --
> Spark Shen
> China Software Development Lab, IBM
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

Re: [classlib][regex|luni] build break

Posted by Spark Shen <sm...@gmail.com>.
Hi Anton:

There are still two problems here:

1. Error message printed out on Harmony in console:
java.util.regex.PatternSyntaxException: unmatched ) near index: 1
 b)a
 ^
 java.util.regex.PatternSyntaxException: unmatched ) near index: 4
 bcde)a
 ^
 java.util.regex.PatternSyntaxException: unmatched ) near index: 5
 bbg())a
 ^
 java.util.regex.PatternSyntaxException: unmatched ) near index: 7
 cdb(?i))a

2. Some test cases in PatternTest simply uses System.out.println() 
instead of assertion, this way failed test cases
can not be easily find out using JUnit output

Best regards

Anton Ivanov 写道:
> I documented the details on both JIRA issues:
> http://issues.apache.org/jira/browse/HARMONY-688
> http://issues.apache.org/jira/browse/HARMONY-933
> So, please mark these issues as non-bug-differences if needed.
>
> Thanks,
> Anton
>
> On 10/12/06, Paulex Yang <pa...@gmail.com> wrote:
>>
>> Anton Ivanov wrote:
>> > The problem is in the RI. These failures are the RI bugs.
>> >
>> > The test failures on the RI you pointed out can be grouped into the 
>> two
>> I guess you meant three ;-)
>> > categories:
>> Is category2, the supplemental character issue, included in the
>> HARMONY-933? How about to document the details like below on that JIRA,
>> and mark it as non-bug difference?
>> >
>> > 1. Canonical equivalence related.
>> >
>> > java.util.regex.PatternSyntaxException: Unclosed group near index 59
>> > (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
>> > ^
>> > at java.util.regex.Pattern.error(Pattern.java:1650)
>> > at java.util.regex.Pattern.accept(Pattern.java:1508)
>> > at java.util.regex.Pattern.group0(Pattern.java:2460)
>> > at java.util.regex.Pattern.sequence(Pattern.java:1715)
>> > at java.util.regex.Pattern.expr(Pattern.java:1687)
>> > at java.util.regex.Pattern.compile(Pattern.java:1397)
>> > at java.util.regex.Pattern.<init>(Pattern.java:1124)
>> > at java.util.regex.Pattern.compile(Pattern.java:840)
>> > at
>> > org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
>> > PatternTest.java:1060)
>> >
>> > The RI fails to compile the following pattern with CANON_EQ flag
>> > specified:
>> >       "\u01E0\u00CCcdb(ac)"
>> > This is due to the RI tries to build alternations to take into account
>> > canonical equivalence.
>> > And the RI does so in simple cases. But if pattern is a little more
>> > complex the RI fails to compile it.
>> > So the RI builds these alternations wrong.
>> > You can see the following bug:
>> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4867170
>> >
>> > I wrote about these test failures on the RI here:
>> > http://issues.apache.org/jira/browse/HARMONY-933
>> >
>> > 2. Supplementary Unicode codepoints related.
>> >
>> > For example let's see at:
>> >
>> > testPredefinedClassesWithSurrogatesSupplementary
>> > junit.framework.AssertionFailedError: null
>> > at junit.framework.Assert.fail(Assert.java:47)
>> > at junit.framework.Assert.assertTrue(Assert.java:20)
>> > at junit.framework.Assert.assertFalse(Assert.java:34)
>> > at junit.framework.Assert.assertFalse(Assert.java:41)
>> > at
>> >
>> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary 
>>
>> >
>> > (PatternTest.java:1477)
>> >
>> > Here we try to find surrogate character in a codepoint \uD916\uDE27.
>> > It is written here:
>> > http://www.unicode.org/reports/tr18/#Supplementary_Characters
>> >
>> > "Surrogate pairs (or their equivalents in other encoding forms) are be
>> > handled internally as single code point values"
>> >
>> > So we have to treat text as code points not code units.
>> > Here \uD916\uDE27 is a one code point consisting of
>> > two code units (two surrogate characters) so we find nothing.
>> > (I added a comment with this explanation to the
>> > testPredefinedClassesWithSurrogatesSupplementary()).
>> > But the RI doesn't treat this codepoint as a single whole, this is the
>> RI
>> > bug
>> > and this is wrong according to the technical report.
>> >
>> > 3. Error messages
>> > java.util.regex.PatternSyntaxException: unmatched ) near index: 1
>> > b)a
>> > ^
>> > java.util.regex.PatternSyntaxException: unmatched ) near index: 4
>> > bcde)a
>> > ^
>> > java.util.regex.PatternSyntaxException: unmatched ) near index: 5
>> > bbg())a
>> > ^
>> > java.util.regex.PatternSyntaxException: unmatched ) near index: 7
>> > cdb(?i))a
>> > ^
>> > are printed in the testCompileStringint().
>> > This test is needed to verify that appropriate exceptions are thrown
>> > if we compile a wrong builded regular expression.
>> >
>> > Thanks,
>> > Anton
>> >
>> > On 10/12/06, Spark Shen <sm...@gmail.com> wrote:
>> >>
>> >> Anton Ivanov 写道:
>> >> > On 10/10/06, Anton Ivanov <an...@gmail.com> wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >> On 10/10/06, Tim Ellison <t....@gmail.com> wrote:
>> >> >> >
>> >> >> > So I checked in a patch for HARMONY-688's regex fix, and it 
>> passed
>> >> the
>> >> >> > regex unit tests, but causes the existing luni tests to fail in
>> >> >> > java.util.Scanner. I've not figured out the base cause of the
>> >> failure
>> >> >> > so I've backed out the changes.
>> >> >> >
>> >> >> > Regards,
>> >> >> > Tim
>> >> >> >
>> >> >> > --
>> >> >> >
>> >> >> > Tim Ellison (t.p.ellison@gmail.com )
>> >> >> > IBM Java technology centre, UK.
>> >> >> >
>> >> >> >
>> >> ---------------------------------------------------------------------
>> >> >> > Terms of use : http://incubator.apache.org/harmony/mailing.html
>> >> >> > To unsubscribe, e-mail:
>> >> harmony-dev-unsubscribe@incubator.apache.org
>> >> >> > For additional commands, e-mail:
>> >> harmony-dev-help@incubator.apache.org
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> This is my patch.
>> >> >> I'll look into this problem and try to correct the patch.
>> >> >>
>> >> >> Thanks,
>> >> >> Anton
>> >> >>
>> >> > There was a bug in the newly created class SupplRangeSet.java.
>> >> > There was the following code in the method matches() of
>> >> > SupplRangeSet.java:
>> >> > ...
>> >> > if (stringIndex < strLength) {
>> >> > char high = testString.charAt(stringIndex++);
>> >> >
>> >> > if (contains(high) &&
>> >> > next.matches(stringIndex, testString, matchResult) > 0)
>> >> > {
>> >> > return 1;
>> >> > }
>> >> > ...
>> >> > But it is wrong simply to return 1, though we can read about method
>> >> > matches() in AbstractSet.java comments:
>> >> >
>> >> > "Checks if this node matches in given position and recursively call
>> >> > next node matches on positive self match. Returns positive 
>> integer if
>> >> > entire match succeed, negative otherwise
>> >> > return -1 if match fails or n > 0;"
>> >> > In fact method matches() returns not only a positive n > 0. The n
>> >> is an
>> >> > offset in case of a positive
>> >> > match attempt. This fact is took into account in all old classes of
>> >> > java.util.regex, but I forgot this fact in SupplRangeSet.java
>> >> > So I corrected method matches() of the SupplRangeSet class as
>> follows:
>> >> > ...
>> >> > int offset = -1;
>> >> > if (stringIndex < strLength) {
>> >> > char high = testString.charAt(stringIndex++);
>> >> >
>> >> > if (contains(high) &&
>> >> > (offset = next.matches(stringIndex, testString,
>> >> > matchResult)) > 0) {
>> >> > return offset;
>> >> > }
>> >> > ...
>> >> > I corrected the patch and attached it to the issue.
>> >> > I verified that regex and luni tests pass normally with the patch
>> >> > applied.
>> >> >
>> >> > Thanks,
>> >> > Anton
>> >> >
>> >> Hi Anton:
>> >> It must be very excited to handle such a complex problem. :-)
>> >>
>> >> But after applying the new patch (and test patch applied), I still 
>> got
>> >> problems:
>> >> Of test class: 
>> org.apache.harmony.tests.java.util.regex.PatternTest, 4
>> >> test methods fail on RI:
>> >> testCanonEqFlag:
>> >> java.util.regex.PatternSyntaxException: Unclosed group near index 59
>> >> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
>> >> ^
>> >> at java.util.regex.Pattern.error(Pattern.java:1650)
>> >> at java.util.regex.Pattern.accept(Pattern.java:1508)
>> >> at java.util.regex.Pattern.group0(Pattern.java:2460)
>> >> at java.util.regex.Pattern.sequence(Pattern.java:1715)
>> >> at java.util.regex.Pattern.expr(Pattern.java:1687)
>> >> at java.util.regex.Pattern.compile(Pattern.java:1397)
>> >> at java.util.regex.Pattern.<init>(Pattern.java:1124)
>> >> at java.util.regex.Pattern.compile(Pattern.java:840)
>> >> at
>> >> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
>> >> PatternTest.java:1060)
>> >>
>> >> testIndexesCanonicalEq:
>> >> junit.framework.AssertionFailedError: null
>> >> at junit.framework.Assert.fail(Assert.java:47)
>> >> at junit.framework.Assert.assertTrue(Assert.java:20)
>> >> at junit.framework.Assert.assertTrue(Assert.java:27)
>> >> at
>> >>
>> >>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testIndexesCanonicalEq 
>>
>> >>
>> >> (PatternTest.java:1247)
>> >>
>> >> testCanonEqFlagWithSupplementaryCharacters:
>> >> junit.framework.AssertionFailedError: null
>> >> at junit.framework.Assert.fail(Assert.java:47)
>> >> at junit.framework.Assert.assertTrue(Assert.java:20)
>> >> at junit.framework.Assert.assertTrue(Assert.java:27)
>> >> at
>> >>
>> >>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlagWithSupplementaryCharacters 
>>
>> >>
>> >> (PatternTest.java:1275)
>> >>
>> >> testPredefinedClassesWithSurrogatesSupplementary
>> >> junit.framework.AssertionFailedError: null
>> >> at junit.framework.Assert.fail(Assert.java:47)
>> >> at junit.framework.Assert.assertTrue(Assert.java:20)
>> >> at junit.framework.Assert.assertFalse(Assert.java:34)
>> >> at junit.framework.Assert.assertFalse(Assert.java:41)
>> >> at
>> >>
>> >>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary 
>>
>> >>
>> >> (PatternTest.java:1477)
>> >> If they are the bugs of RI, shall we add comments for them in the 
>> test
>> >> case?
>> >>
>> >> and Error message printed out on console on Harmony. Since there are
>> >> test cases use System.out instead of assert, I could not locate where
>> >> these error message comes from:
>> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
>> >> b)a
>> >> ^
>> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
>> >> bcde)a
>> >> ^
>> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
>> >> bbg())a
>> >> ^
>> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
>> >> cdb(?i))a
>> >> ^
>> >> And last, the good news is luni tests do pass. :-)
>> >>
>> >> Best regards
>> >>
>> >> --
>> >> Spark Shen
>> >> China Software Development Lab, IBM
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> >> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> >> For additional commands, e-mail: 
>> harmony-dev-help@incubator.apache.org
>> >>
>> >>
>>
>>
>> -- 
>> Paulex Yang
>> China Software Development Lab
>> IBM
>>
>>
>>
>> ---------------------------------------------------------------------
>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>
>>


-- 
Spark Shen
China Software Development Lab, IBM


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][regex|luni] build break

Posted by Anton Ivanov <an...@gmail.com>.
I documented the details on both JIRA issues:
http://issues.apache.org/jira/browse/HARMONY-688
http://issues.apache.org/jira/browse/HARMONY-933
So, please mark these issues as non-bug-differences if needed.

Thanks,
Anton

On 10/12/06, Paulex Yang <pa...@gmail.com> wrote:
>
> Anton Ivanov wrote:
> > The problem is in the RI. These failures are the RI bugs.
> >
> > The test failures on the RI you pointed out can be grouped into the two
> I guess you meant three ;-)
> > categories:
> Is category2, the supplemental character issue, included in the
> HARMONY-933? How about to document the details like below on that JIRA,
> and mark it as non-bug difference?
> >
> > 1. Canonical equivalence related.
> >
> > java.util.regex.PatternSyntaxException: Unclosed group near index 59
> > (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
> > ^
> > at java.util.regex.Pattern.error(Pattern.java:1650)
> > at java.util.regex.Pattern.accept(Pattern.java:1508)
> > at java.util.regex.Pattern.group0(Pattern.java:2460)
> > at java.util.regex.Pattern.sequence(Pattern.java:1715)
> > at java.util.regex.Pattern.expr(Pattern.java:1687)
> > at java.util.regex.Pattern.compile(Pattern.java:1397)
> > at java.util.regex.Pattern.<init>(Pattern.java:1124)
> > at java.util.regex.Pattern.compile(Pattern.java:840)
> > at
> > org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
> > PatternTest.java:1060)
> >
> > The RI fails to compile the following pattern with CANON_EQ flag
> > specified:
> >       "\u01E0\u00CCcdb(ac)"
> > This is due to the RI tries to build alternations to take into account
> > canonical equivalence.
> > And the RI does so in simple cases. But if pattern is a little more
> > complex the RI fails to compile it.
> > So the RI builds these alternations wrong.
> > You can see the following bug:
> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4867170
> >
> > I wrote about these test failures on the RI here:
> > http://issues.apache.org/jira/browse/HARMONY-933
> >
> > 2. Supplementary Unicode codepoints related.
> >
> > For example let's see at:
> >
> > testPredefinedClassesWithSurrogatesSupplementary
> > junit.framework.AssertionFailedError: null
> > at junit.framework.Assert.fail(Assert.java:47)
> > at junit.framework.Assert.assertTrue(Assert.java:20)
> > at junit.framework.Assert.assertFalse(Assert.java:34)
> > at junit.framework.Assert.assertFalse(Assert.java:41)
> > at
> >
> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary
> >
> > (PatternTest.java:1477)
> >
> > Here we try to find surrogate character in a codepoint \uD916\uDE27.
> > It is written here:
> > http://www.unicode.org/reports/tr18/#Supplementary_Characters
> >
> > "Surrogate pairs (or their equivalents in other encoding forms) are be
> > handled internally as single code point values"
> >
> > So we have to treat text as code points not code units.
> > Here \uD916\uDE27 is a one code point consisting of
> > two code units (two surrogate characters) so we find nothing.
> > (I added a comment with this explanation to the
> > testPredefinedClassesWithSurrogatesSupplementary()).
> > But the RI doesn't treat this codepoint as a single whole, this is the
> RI
> > bug
> > and this is wrong according to the technical report.
> >
> > 3. Error messages
> > java.util.regex.PatternSyntaxException: unmatched ) near index: 1
> > b)a
> > ^
> > java.util.regex.PatternSyntaxException: unmatched ) near index: 4
> > bcde)a
> > ^
> > java.util.regex.PatternSyntaxException: unmatched ) near index: 5
> > bbg())a
> > ^
> > java.util.regex.PatternSyntaxException: unmatched ) near index: 7
> > cdb(?i))a
> > ^
> > are printed in the testCompileStringint().
> > This test is needed to verify that appropriate exceptions are thrown
> > if we compile a wrong builded regular expression.
> >
> > Thanks,
> > Anton
> >
> > On 10/12/06, Spark Shen <sm...@gmail.com> wrote:
> >>
> >> Anton Ivanov 写道:
> >> > On 10/10/06, Anton Ivanov <an...@gmail.com> wrote:
> >> >>
> >> >>
> >> >>
> >> >> On 10/10/06, Tim Ellison <t....@gmail.com> wrote:
> >> >> >
> >> >> > So I checked in a patch for HARMONY-688's regex fix, and it passed
> >> the
> >> >> > regex unit tests, but causes the existing luni tests to fail in
> >> >> > java.util.Scanner. I've not figured out the base cause of the
> >> failure
> >> >> > so I've backed out the changes.
> >> >> >
> >> >> > Regards,
> >> >> > Tim
> >> >> >
> >> >> > --
> >> >> >
> >> >> > Tim Ellison (t.p.ellison@gmail.com )
> >> >> > IBM Java technology centre, UK.
> >> >> >
> >> >> >
> >> ---------------------------------------------------------------------
> >> >> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> >> >> > To unsubscribe, e-mail:
> >> harmony-dev-unsubscribe@incubator.apache.org
> >> >> > For additional commands, e-mail:
> >> harmony-dev-help@incubator.apache.org
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> This is my patch.
> >> >> I'll look into this problem and try to correct the patch.
> >> >>
> >> >> Thanks,
> >> >> Anton
> >> >>
> >> > There was a bug in the newly created class SupplRangeSet.java.
> >> > There was the following code in the method matches() of
> >> > SupplRangeSet.java:
> >> > ...
> >> > if (stringIndex < strLength) {
> >> > char high = testString.charAt(stringIndex++);
> >> >
> >> > if (contains(high) &&
> >> > next.matches(stringIndex, testString, matchResult) > 0)
> >> > {
> >> > return 1;
> >> > }
> >> > ...
> >> > But it is wrong simply to return 1, though we can read about method
> >> > matches() in AbstractSet.java comments:
> >> >
> >> > "Checks if this node matches in given position and recursively call
> >> > next node matches on positive self match. Returns positive integer if
> >> > entire match succeed, negative otherwise
> >> > return -1 if match fails or n > 0;"
> >> > In fact method matches() returns not only a positive n > 0. The n
> >> is an
> >> > offset in case of a positive
> >> > match attempt. This fact is took into account in all old classes of
> >> > java.util.regex, but I forgot this fact in SupplRangeSet.java
> >> > So I corrected method matches() of the SupplRangeSet class as
> follows:
> >> > ...
> >> > int offset = -1;
> >> > if (stringIndex < strLength) {
> >> > char high = testString.charAt(stringIndex++);
> >> >
> >> > if (contains(high) &&
> >> > (offset = next.matches(stringIndex, testString,
> >> > matchResult)) > 0) {
> >> > return offset;
> >> > }
> >> > ...
> >> > I corrected the patch and attached it to the issue.
> >> > I verified that regex and luni tests pass normally with the patch
> >> > applied.
> >> >
> >> > Thanks,
> >> > Anton
> >> >
> >> Hi Anton:
> >> It must be very excited to handle such a complex problem. :-)
> >>
> >> But after applying the new patch (and test patch applied), I still got
> >> problems:
> >> Of test class: org.apache.harmony.tests.java.util.regex.PatternTest, 4
> >> test methods fail on RI:
> >> testCanonEqFlag:
> >> java.util.regex.PatternSyntaxException: Unclosed group near index 59
> >> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
> >> ^
> >> at java.util.regex.Pattern.error(Pattern.java:1650)
> >> at java.util.regex.Pattern.accept(Pattern.java:1508)
> >> at java.util.regex.Pattern.group0(Pattern.java:2460)
> >> at java.util.regex.Pattern.sequence(Pattern.java:1715)
> >> at java.util.regex.Pattern.expr(Pattern.java:1687)
> >> at java.util.regex.Pattern.compile(Pattern.java:1397)
> >> at java.util.regex.Pattern.<init>(Pattern.java:1124)
> >> at java.util.regex.Pattern.compile(Pattern.java:840)
> >> at
> >> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
> >> PatternTest.java:1060)
> >>
> >> testIndexesCanonicalEq:
> >> junit.framework.AssertionFailedError: null
> >> at junit.framework.Assert.fail(Assert.java:47)
> >> at junit.framework.Assert.assertTrue(Assert.java:20)
> >> at junit.framework.Assert.assertTrue(Assert.java:27)
> >> at
> >>
> >>
> org.apache.harmony.tests.java.util.regex.PatternTest.testIndexesCanonicalEq
> >>
> >> (PatternTest.java:1247)
> >>
> >> testCanonEqFlagWithSupplementaryCharacters:
> >> junit.framework.AssertionFailedError: null
> >> at junit.framework.Assert.fail(Assert.java:47)
> >> at junit.framework.Assert.assertTrue(Assert.java:20)
> >> at junit.framework.Assert.assertTrue(Assert.java:27)
> >> at
> >>
> >>
> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlagWithSupplementaryCharacters
> >>
> >> (PatternTest.java:1275)
> >>
> >> testPredefinedClassesWithSurrogatesSupplementary
> >> junit.framework.AssertionFailedError: null
> >> at junit.framework.Assert.fail(Assert.java:47)
> >> at junit.framework.Assert.assertTrue(Assert.java:20)
> >> at junit.framework.Assert.assertFalse(Assert.java:34)
> >> at junit.framework.Assert.assertFalse(Assert.java:41)
> >> at
> >>
> >>
> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary
> >>
> >> (PatternTest.java:1477)
> >> If they are the bugs of RI, shall we add comments for them in the test
> >> case?
> >>
> >> and Error message printed out on console on Harmony. Since there are
> >> test cases use System.out instead of assert, I could not locate where
> >> these error message comes from:
> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
> >> b)a
> >> ^
> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
> >> bcde)a
> >> ^
> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
> >> bbg())a
> >> ^
> >> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
> >> cdb(?i))a
> >> ^
> >> And last, the good news is luni tests do pass. :-)
> >>
> >> Best regards
> >>
> >> --
> >> Spark Shen
> >> China Software Development Lab, IBM
> >>
> >>
> >> ---------------------------------------------------------------------
> >> Terms of use : http://incubator.apache.org/harmony/mailing.html
> >> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >>
> >>
>
>
> --
> Paulex Yang
> China Software Development Lab
> IBM
>
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

Re: [classlib][regex|luni] build break

Posted by Paulex Yang <pa...@gmail.com>.
Anton Ivanov wrote:
> The problem is in the RI. These failures are the RI bugs.
>
> The test failures on the RI you pointed out can be grouped into the two
I guess you meant three ;-)
> categories:
Is category2, the supplemental character issue, included in the 
HARMONY-933? How about to document the details like below on that JIRA, 
and mark it as non-bug difference?
>
> 1. Canonical equivalence related.
>
> java.util.regex.PatternSyntaxException: Unclosed group near index 59
> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
> ^
> at java.util.regex.Pattern.error(Pattern.java:1650)
> at java.util.regex.Pattern.accept(Pattern.java:1508)
> at java.util.regex.Pattern.group0(Pattern.java:2460)
> at java.util.regex.Pattern.sequence(Pattern.java:1715)
> at java.util.regex.Pattern.expr(Pattern.java:1687)
> at java.util.regex.Pattern.compile(Pattern.java:1397)
> at java.util.regex.Pattern.<init>(Pattern.java:1124)
> at java.util.regex.Pattern.compile(Pattern.java:840)
> at
> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
> PatternTest.java:1060)
>
> The RI fails to compile the following pattern with CANON_EQ flag 
> specified:
>       "\u01E0\u00CCcdb(ac)"
> This is due to the RI tries to build alternations to take into account
> canonical equivalence.
> And the RI does so in simple cases. But if pattern is a little more
> complex the RI fails to compile it.
> So the RI builds these alternations wrong.
> You can see the following bug:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4867170
>
> I wrote about these test failures on the RI here:
> http://issues.apache.org/jira/browse/HARMONY-933
>
> 2. Supplementary Unicode codepoints related.
>
> For example let's see at:
>
> testPredefinedClassesWithSurrogatesSupplementary
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:47)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertFalse(Assert.java:34)
> at junit.framework.Assert.assertFalse(Assert.java:41)
> at
> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary 
>
> (PatternTest.java:1477)
>
> Here we try to find surrogate character in a codepoint \uD916\uDE27.
> It is written here:
> http://www.unicode.org/reports/tr18/#Supplementary_Characters
>
> "Surrogate pairs (or their equivalents in other encoding forms) are be
> handled internally as single code point values"
>
> So we have to treat text as code points not code units.
> Here \uD916\uDE27 is a one code point consisting of
> two code units (two surrogate characters) so we find nothing.
> (I added a comment with this explanation to the
> testPredefinedClassesWithSurrogatesSupplementary()).
> But the RI doesn't treat this codepoint as a single whole, this is the RI
> bug
> and this is wrong according to the technical report.
>
> 3. Error messages
> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
> b)a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
> bcde)a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
> bbg())a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
> cdb(?i))a
> ^
> are printed in the testCompileStringint().
> This test is needed to verify that appropriate exceptions are thrown
> if we compile a wrong builded regular expression.
>
> Thanks,
> Anton
>
> On 10/12/06, Spark Shen <sm...@gmail.com> wrote:
>>
>> Anton Ivanov 写道:
>> > On 10/10/06, Anton Ivanov <an...@gmail.com> wrote:
>> >>
>> >>
>> >>
>> >> On 10/10/06, Tim Ellison <t....@gmail.com> wrote:
>> >> >
>> >> > So I checked in a patch for HARMONY-688's regex fix, and it passed
>> the
>> >> > regex unit tests, but causes the existing luni tests to fail in
>> >> > java.util.Scanner. I've not figured out the base cause of the 
>> failure
>> >> > so I've backed out the changes.
>> >> >
>> >> > Regards,
>> >> > Tim
>> >> >
>> >> > --
>> >> >
>> >> > Tim Ellison (t.p.ellison@gmail.com )
>> >> > IBM Java technology centre, UK.
>> >> >
>> >> > 
>> ---------------------------------------------------------------------
>> >> > Terms of use : http://incubator.apache.org/harmony/mailing.html
>> >> > To unsubscribe, e-mail: 
>> harmony-dev-unsubscribe@incubator.apache.org
>> >> > For additional commands, e-mail:
>> harmony-dev-help@incubator.apache.org
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> This is my patch.
>> >> I'll look into this problem and try to correct the patch.
>> >>
>> >> Thanks,
>> >> Anton
>> >>
>> > There was a bug in the newly created class SupplRangeSet.java.
>> > There was the following code in the method matches() of
>> > SupplRangeSet.java:
>> > ...
>> > if (stringIndex < strLength) {
>> > char high = testString.charAt(stringIndex++);
>> >
>> > if (contains(high) &&
>> > next.matches(stringIndex, testString, matchResult) > 0)
>> > {
>> > return 1;
>> > }
>> > ...
>> > But it is wrong simply to return 1, though we can read about method
>> > matches() in AbstractSet.java comments:
>> >
>> > "Checks if this node matches in given position and recursively call
>> > next node matches on positive self match. Returns positive integer if
>> > entire match succeed, negative otherwise
>> > return -1 if match fails or n > 0;"
>> > In fact method matches() returns not only a positive n > 0. The n 
>> is an
>> > offset in case of a positive
>> > match attempt. This fact is took into account in all old classes of
>> > java.util.regex, but I forgot this fact in SupplRangeSet.java
>> > So I corrected method matches() of the SupplRangeSet class as follows:
>> > ...
>> > int offset = -1;
>> > if (stringIndex < strLength) {
>> > char high = testString.charAt(stringIndex++);
>> >
>> > if (contains(high) &&
>> > (offset = next.matches(stringIndex, testString,
>> > matchResult)) > 0) {
>> > return offset;
>> > }
>> > ...
>> > I corrected the patch and attached it to the issue.
>> > I verified that regex and luni tests pass normally with the patch
>> > applied.
>> >
>> > Thanks,
>> > Anton
>> >
>> Hi Anton:
>> It must be very excited to handle such a complex problem. :-)
>>
>> But after applying the new patch (and test patch applied), I still got
>> problems:
>> Of test class: org.apache.harmony.tests.java.util.regex.PatternTest, 4
>> test methods fail on RI:
>> testCanonEqFlag:
>> java.util.regex.PatternSyntaxException: Unclosed group near index 59
>> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
>> ^
>> at java.util.regex.Pattern.error(Pattern.java:1650)
>> at java.util.regex.Pattern.accept(Pattern.java:1508)
>> at java.util.regex.Pattern.group0(Pattern.java:2460)
>> at java.util.regex.Pattern.sequence(Pattern.java:1715)
>> at java.util.regex.Pattern.expr(Pattern.java:1687)
>> at java.util.regex.Pattern.compile(Pattern.java:1397)
>> at java.util.regex.Pattern.<init>(Pattern.java:1124)
>> at java.util.regex.Pattern.compile(Pattern.java:840)
>> at
>> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
>> PatternTest.java:1060)
>>
>> testIndexesCanonicalEq:
>> junit.framework.AssertionFailedError: null
>> at junit.framework.Assert.fail(Assert.java:47)
>> at junit.framework.Assert.assertTrue(Assert.java:20)
>> at junit.framework.Assert.assertTrue(Assert.java:27)
>> at
>>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testIndexesCanonicalEq 
>>
>> (PatternTest.java:1247)
>>
>> testCanonEqFlagWithSupplementaryCharacters:
>> junit.framework.AssertionFailedError: null
>> at junit.framework.Assert.fail(Assert.java:47)
>> at junit.framework.Assert.assertTrue(Assert.java:20)
>> at junit.framework.Assert.assertTrue(Assert.java:27)
>> at
>>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlagWithSupplementaryCharacters 
>>
>> (PatternTest.java:1275)
>>
>> testPredefinedClassesWithSurrogatesSupplementary
>> junit.framework.AssertionFailedError: null
>> at junit.framework.Assert.fail(Assert.java:47)
>> at junit.framework.Assert.assertTrue(Assert.java:20)
>> at junit.framework.Assert.assertFalse(Assert.java:34)
>> at junit.framework.Assert.assertFalse(Assert.java:41)
>> at
>>
>> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary 
>>
>> (PatternTest.java:1477)
>> If they are the bugs of RI, shall we add comments for them in the test
>> case?
>>
>> and Error message printed out on console on Harmony. Since there are
>> test cases use System.out instead of assert, I could not locate where
>> these error message comes from:
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
>> b)a
>> ^
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
>> bcde)a
>> ^
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
>> bbg())a
>> ^
>> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
>> cdb(?i))a
>> ^
>> And last, the good news is luni tests do pass. :-)
>>
>> Best regards
>>
>> -- 
>> Spark Shen
>> China Software Development Lab, IBM
>>
>>
>> ---------------------------------------------------------------------
>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>
>>


-- 
Paulex Yang
China Software Development Lab
IBM



---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][regex|luni] build break

Posted by Anton Ivanov <an...@gmail.com>.
The problem is in the RI. These failures are the RI bugs.

The test failures on the RI you pointed out can be grouped into the two
categories:

1. Canonical equivalence related.

java.util.regex.PatternSyntaxException: Unclosed group near index 59
(?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
^
at java.util.regex.Pattern.error(Pattern.java:1650)
at java.util.regex.Pattern.accept(Pattern.java:1508)
at java.util.regex.Pattern.group0(Pattern.java:2460)
at java.util.regex.Pattern.sequence(Pattern.java:1715)
at java.util.regex.Pattern.expr(Pattern.java:1687)
at java.util.regex.Pattern.compile(Pattern.java:1397)
at java.util.regex.Pattern.<init>(Pattern.java:1124)
at java.util.regex.Pattern.compile(Pattern.java:840)
at
org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
PatternTest.java:1060)

The RI fails to compile the following pattern with CANON_EQ flag specified:
       "\u01E0\u00CCcdb(ac)"
This is due to the RI tries to build alternations to take into account
canonical equivalence.
And the RI does so in simple cases. But if pattern is a little more
complex the RI fails to compile it.
So the RI builds these alternations wrong.
You can see the following bug:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4867170

I wrote about these test failures on the RI here:
http://issues.apache.org/jira/browse/HARMONY-933

2. Supplementary Unicode codepoints related.

For example let's see at:

testPredefinedClassesWithSurrogatesSupplementary
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertFalse(Assert.java:34)
at junit.framework.Assert.assertFalse(Assert.java:41)
at
org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary
(PatternTest.java:1477)

Here we try to find surrogate character in a codepoint \uD916\uDE27.
It is written here:
http://www.unicode.org/reports/tr18/#Supplementary_Characters

"Surrogate pairs (or their equivalents in other encoding forms) are be
handled internally as single code point values"

So we have to treat text as code points not code units.
Here \uD916\uDE27 is a one code point consisting of
two code units (two surrogate characters) so we find nothing.
(I added a comment with this explanation to the
testPredefinedClassesWithSurrogatesSupplementary()).
But the RI doesn't treat this codepoint as a single whole, this is the RI
bug
and this is wrong according to the technical report.

3. Error messages
java.util.regex.PatternSyntaxException: unmatched ) near index: 1
b)a
^
java.util.regex.PatternSyntaxException: unmatched ) near index: 4
bcde)a
^
java.util.regex.PatternSyntaxException: unmatched ) near index: 5
bbg())a
^
java.util.regex.PatternSyntaxException: unmatched ) near index: 7
cdb(?i))a
^
are printed in the testCompileStringint().
This test is needed to verify that appropriate exceptions are thrown
if we compile a wrong builded regular expression.

Thanks,
Anton

On 10/12/06, Spark Shen <sm...@gmail.com> wrote:
>
> Anton Ivanov 写道:
> > On 10/10/06, Anton Ivanov <an...@gmail.com> wrote:
> >>
> >>
> >>
> >> On 10/10/06, Tim Ellison <t....@gmail.com> wrote:
> >> >
> >> > So I checked in a patch for HARMONY-688's regex fix, and it passed
> the
> >> > regex unit tests, but causes the existing luni tests to fail in
> >> > java.util.Scanner. I've not figured out the base cause of the failure
> >> > so I've backed out the changes.
> >> >
> >> > Regards,
> >> > Tim
> >> >
> >> > --
> >> >
> >> > Tim Ellison (t.p.ellison@gmail.com )
> >> > IBM Java technology centre, UK.
> >> >
> >> > ---------------------------------------------------------------------
> >> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> >> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> >> > For additional commands, e-mail:
> harmony-dev-help@incubator.apache.org
> >>
> >>
> >>
> >>
> >>
> >> This is my patch.
> >> I'll look into this problem and try to correct the patch.
> >>
> >> Thanks,
> >> Anton
> >>
> > There was a bug in the newly created class SupplRangeSet.java.
> > There was the following code in the method matches() of
> > SupplRangeSet.java:
> > ...
> > if (stringIndex < strLength) {
> > char high = testString.charAt(stringIndex++);
> >
> > if (contains(high) &&
> > next.matches(stringIndex, testString, matchResult) > 0)
> > {
> > return 1;
> > }
> > ...
> > But it is wrong simply to return 1, though we can read about method
> > matches() in AbstractSet.java comments:
> >
> > "Checks if this node matches in given position and recursively call
> > next node matches on positive self match. Returns positive integer if
> > entire match succeed, negative otherwise
> > return -1 if match fails or n > 0;"
> > In fact method matches() returns not only a positive n > 0. The n is an
> > offset in case of a positive
> > match attempt. This fact is took into account in all old classes of
> > java.util.regex, but I forgot this fact in SupplRangeSet.java
> > So I corrected method matches() of the SupplRangeSet class as follows:
> > ...
> > int offset = -1;
> > if (stringIndex < strLength) {
> > char high = testString.charAt(stringIndex++);
> >
> > if (contains(high) &&
> > (offset = next.matches(stringIndex, testString,
> > matchResult)) > 0) {
> > return offset;
> > }
> > ...
> > I corrected the patch and attached it to the issue.
> > I verified that regex and luni tests pass normally with the patch
> > applied.
> >
> > Thanks,
> > Anton
> >
> Hi Anton:
> It must be very excited to handle such a complex problem. :-)
>
> But after applying the new patch (and test patch applied), I still got
> problems:
> Of test class: org.apache.harmony.tests.java.util.regex.PatternTest, 4
> test methods fail on RI:
> testCanonEqFlag:
> java.util.regex.PatternSyntaxException: Unclosed group near index 59
> (?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
> ^
> at java.util.regex.Pattern.error(Pattern.java:1650)
> at java.util.regex.Pattern.accept(Pattern.java:1508)
> at java.util.regex.Pattern.group0(Pattern.java:2460)
> at java.util.regex.Pattern.sequence(Pattern.java:1715)
> at java.util.regex.Pattern.expr(Pattern.java:1687)
> at java.util.regex.Pattern.compile(Pattern.java:1397)
> at java.util.regex.Pattern.<init>(Pattern.java:1124)
> at java.util.regex.Pattern.compile(Pattern.java:840)
> at
> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(
> PatternTest.java:1060)
>
> testIndexesCanonicalEq:
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:47)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at
>
> org.apache.harmony.tests.java.util.regex.PatternTest.testIndexesCanonicalEq
> (PatternTest.java:1247)
>
> testCanonEqFlagWithSupplementaryCharacters:
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:47)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertTrue(Assert.java:27)
> at
>
> org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlagWithSupplementaryCharacters
> (PatternTest.java:1275)
>
> testPredefinedClassesWithSurrogatesSupplementary
> junit.framework.AssertionFailedError: null
> at junit.framework.Assert.fail(Assert.java:47)
> at junit.framework.Assert.assertTrue(Assert.java:20)
> at junit.framework.Assert.assertFalse(Assert.java:34)
> at junit.framework.Assert.assertFalse(Assert.java:41)
> at
>
> org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary
> (PatternTest.java:1477)
> If they are the bugs of RI, shall we add comments for them in the test
> case?
>
> and Error message printed out on console on Harmony. Since there are
> test cases use System.out instead of assert, I could not locate where
> these error message comes from:
> java.util.regex.PatternSyntaxException: unmatched ) near index: 1
> b)a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 4
> bcde)a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 5
> bbg())a
> ^
> java.util.regex.PatternSyntaxException: unmatched ) near index: 7
> cdb(?i))a
> ^
> And last, the good news is luni tests do pass. :-)
>
> Best regards
>
> --
> Spark Shen
> China Software Development Lab, IBM
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

Re: [classlib][regex|luni] build break

Posted by Spark Shen <sm...@gmail.com>.
Anton Ivanov 写道:
> On 10/10/06, Anton Ivanov <an...@gmail.com> wrote:
>>
>>
>>
>> On 10/10/06, Tim Ellison <t....@gmail.com> wrote:
>> >
>> > So I checked in a patch for HARMONY-688's regex fix, and it passed the
>> > regex unit tests, but causes the existing luni tests to fail in
>> > java.util.Scanner. I've not figured out the base cause of the failure
>> > so I've backed out the changes.
>> >
>> > Regards,
>> > Tim
>> >
>> > --
>> >
>> > Tim Ellison (t.p.ellison@gmail.com )
>> > IBM Java technology centre, UK.
>> >
>> > ---------------------------------------------------------------------
>> > Terms of use : http://incubator.apache.org/harmony/mailing.html
>> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>
>>
>>
>>
>>
>> This is my patch.
>> I'll look into this problem and try to correct the patch.
>>
>> Thanks,
>> Anton
>>
> There was a bug in the newly created class SupplRangeSet.java.
> There was the following code in the method matches() of 
> SupplRangeSet.java:
> ...
> if (stringIndex < strLength) {
> char high = testString.charAt(stringIndex++);
>
> if (contains(high) &&
> next.matches(stringIndex, testString, matchResult) > 0)
> {
> return 1;
> }
> ...
> But it is wrong simply to return 1, though we can read about method
> matches() in AbstractSet.java comments:
>
> "Checks if this node matches in given position and recursively call
> next node matches on positive self match. Returns positive integer if
> entire match succeed, negative otherwise
> return -1 if match fails or n > 0;"
> In fact method matches() returns not only a positive n > 0. The n is an
> offset in case of a positive
> match attempt. This fact is took into account in all old classes of
> java.util.regex, but I forgot this fact in SupplRangeSet.java
> So I corrected method matches() of the SupplRangeSet class as follows:
> ...
> int offset = -1;
> if (stringIndex < strLength) {
> char high = testString.charAt(stringIndex++);
>
> if (contains(high) &&
> (offset = next.matches(stringIndex, testString,
> matchResult)) > 0) {
> return offset;
> }
> ...
> I corrected the patch and attached it to the issue.
> I verified that regex and luni tests pass normally with the patch 
> applied.
>
> Thanks,
> Anton
>
Hi Anton:
It must be very excited to handle such a complex problem. :-)

But after applying the new patch (and test patch applied), I still got 
problems:
Of test class: org.apache.harmony.tests.java.util.regex.PatternTest, 4 
test methods fail on RI:
testCanonEqFlag:
java.util.regex.PatternSyntaxException: Unclosed group near index 59
(?:ǠI|ǠI|ǠI|ȦĪ|ȦĪ|ȦĪ|ǠI|ǠI|Aİ̄(?:Ìc|Ìc|Ic̀)db(ac)
^
at java.util.regex.Pattern.error(Pattern.java:1650)
at java.util.regex.Pattern.accept(Pattern.java:1508)
at java.util.regex.Pattern.group0(Pattern.java:2460)
at java.util.regex.Pattern.sequence(Pattern.java:1715)
at java.util.regex.Pattern.expr(Pattern.java:1687)
at java.util.regex.Pattern.compile(Pattern.java:1397)
at java.util.regex.Pattern.<init>(Pattern.java:1124)
at java.util.regex.Pattern.compile(Pattern.java:840)
at 
org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlag(PatternTest.java:1060)

testIndexesCanonicalEq:
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertTrue(Assert.java:27)
at 
org.apache.harmony.tests.java.util.regex.PatternTest.testIndexesCanonicalEq(PatternTest.java:1247)

testCanonEqFlagWithSupplementaryCharacters:
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertTrue(Assert.java:27)
at 
org.apache.harmony.tests.java.util.regex.PatternTest.testCanonEqFlagWithSupplementaryCharacters(PatternTest.java:1275)

testPredefinedClassesWithSurrogatesSupplementary
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertFalse(Assert.java:34)
at junit.framework.Assert.assertFalse(Assert.java:41)
at 
org.apache.harmony.tests.java.util.regex.PatternTest.testPredefinedClassesWithSurrogatesSupplementary(PatternTest.java:1477)
If they are the bugs of RI, shall we add comments for them in the test case?

and Error message printed out on console on Harmony. Since there are 
test cases use System.out instead of assert, I could not locate where 
these error message comes from:
java.util.regex.PatternSyntaxException: unmatched ) near index: 1
b)a
^
java.util.regex.PatternSyntaxException: unmatched ) near index: 4
bcde)a
^
java.util.regex.PatternSyntaxException: unmatched ) near index: 5
bbg())a
^
java.util.regex.PatternSyntaxException: unmatched ) near index: 7
cdb(?i))a
^
And last, the good news is luni tests do pass. :-)

Best regards

-- 
Spark Shen
China Software Development Lab, IBM


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][regex|luni] build break

Posted by Spark Shen <sm...@gmail.com>.
Anton Ivanov 写道:
> On 10/10/06, Anton Ivanov <an...@gmail.com> wrote:
>>
>>
>>
>> On 10/10/06, Tim Ellison <t....@gmail.com> wrote:
>> >
>> > So I checked in a patch for HARMONY-688's regex fix, and it passed the
>> > regex unit tests, but causes the existing luni tests to fail in
>> > java.util.Scanner. I've not figured out the base cause of the failure
>> > so I've backed out the changes.
>> >
>> > Regards,
>> > Tim
>> >
>> > --
>> >
>> > Tim Ellison (t.p.ellison@gmail.com )
>> > IBM Java technology centre, UK.
>> >
>> > ---------------------------------------------------------------------
>> > Terms of use : http://incubator.apache.org/harmony/mailing.html
>> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>
>>
>>
>>
>>
>> This is my patch.
>> I'll look into this problem and try to correct the patch.
>>
>> Thanks,
>> Anton
>>
> There was a bug in the newly created class SupplRangeSet.java.
> There was the following code in the method matches() of 
> SupplRangeSet.java:
> ...
> if (stringIndex < strLength) {
> char high = testString.charAt(stringIndex++);
>
> if (contains(high) &&
> next.matches(stringIndex, testString, matchResult) > 0)
> {
> return 1;
> }
> ...
> But it is wrong simply to return 1, though we can read about method
> matches() in AbstractSet.java comments:
>
> "Checks if this node matches in given position and recursively call
> next node matches on positive self match. Returns positive integer if
> entire match succeed, negative otherwise
> return -1 if match fails or n > 0;"
> In fact method matches() returns not only a positive n > 0. The n is an
> offset in case of a positive
> match attempt. This fact is took into account in all old classes of
> java.util.regex, but I forgot this fact in SupplRangeSet.java
> So I corrected method matches() of the SupplRangeSet class as follows:
> ...
> int offset = -1;
> if (stringIndex < strLength) {
> char high = testString.charAt(stringIndex++);
>
> if (contains(high) &&
> (offset = next.matches(stringIndex, testString,
> matchResult)) > 0) {
> return offset;
> }
> ...
> I corrected the patch and attached it to the issue.
> I verified that regex and luni tests pass normally with the patch 
> applied.
>
> Thanks,
> Anton
>
OK, I will check it ASSP.

Best regards

-- 
Spark Shen
China Software Development Lab, IBM


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][regex|luni] build break

Posted by Anton Ivanov <an...@gmail.com>.
On 10/10/06, Anton Ivanov <an...@gmail.com> wrote:
>
>
>
> On 10/10/06, Tim Ellison <t....@gmail.com> wrote:
> >
> > So I checked in a patch for HARMONY-688's regex fix, and it passed the
> > regex unit tests, but causes the existing luni tests to fail in
> > java.util.Scanner.  I've not figured out the base cause of the failure
> > so I've backed out the changes.
> >
> > Regards,
> > Tim
> >
> > --
> >
> > Tim Ellison (t.p.ellison@gmail.com )
> > IBM Java technology centre, UK.
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>
>
>
>
> This is my patch.
> I'll look into this problem and try to correct the patch.
>
> Thanks,
> Anton
>
There was a bug in the newly created class SupplRangeSet.java.
There was the following code in the method matches() of SupplRangeSet.java:
...
        if (stringIndex < strLength) {
            char high = testString.charAt(stringIndex++);

            if (contains(high) &&
                    next.matches(stringIndex, testString, matchResult) > 0)
{
                return 1;
            }
...
But it is wrong simply to return 1, though we can read about method
matches() in AbstractSet.java comments:

 "Checks if this node matches in given position and recursively call
  next node matches on positive self match. Returns positive integer if
  entire match succeed, negative otherwise
  return -1 if match fails or n > 0;"
In fact method matches() returns not only a positive n > 0. The n is an
offset in case of a positive
match attempt. This fact is took into account in all old classes of
java.util.regex, but I forgot this fact in SupplRangeSet.java
So I corrected method matches() of the SupplRangeSet class as follows:
...
        int offset = -1;
        if (stringIndex < strLength) {
            char high = testString.charAt(stringIndex++);

            if (contains(high) &&
                    (offset = next.matches(stringIndex, testString,
matchResult)) > 0) {
                return offset;
            }
...
I corrected the patch and attached it to the issue.
I verified that regex and luni tests pass normally with the patch applied.

Thanks,
Anton

Re: [classlib][regex|luni] build break

Posted by Anton Ivanov <an...@gmail.com>.
On 10/10/06, Tim Ellison <t....@gmail.com> wrote:
>
> So I checked in a patch for HARMONY-688's regex fix, and it passed the
> regex unit tests, but causes the existing luni tests to fail in
> java.util.Scanner.  I've not figured out the base cause of the failure
> so I've backed out the changes.
>
> Regards,
> Tim
>
> --
>
> Tim Ellison (t.p.ellison@gmail.com)
> IBM Java technology centre, UK.
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org





This is my patch.
I'll look into this problem and try to correct the patch.

Thanks,
Anton