You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@harmony.apache.org by "Anton Ivanov (JIRA)" <ji...@apache.org> on 2006/10/11 14:39:28 UTC

[jira] Updated: (HARMONY-688) java.util.regex.Matcher does not support Unicode supplementary characters

     [ http://issues.apache.org/jira/browse/HARMONY-688?page=all ]

Anton Ivanov updated HARMONY-688:
---------------------------------

    Attachment: patch_src_corrected.txt

I corrected the patch (patch_src.txt) and attached it to the issue (patch_src_corrected.txt).
I verified that regex and luni tests pass normally with the patch applied. 

There was a bug in the newly created class SupplRangeSet.java.
There was the following code in the method matches() of SupplRangeSet.java:

...
        if (stringIndex < strLength) {            
            char high = testString.charAt(stringIndex++);
            
            if (contains(high) && 
                    next.matches(stringIndex, testString, matchResult) > 0) {
                return 1;
            }
...

But it is wrong simply to return 1, though we can read about method matches() in AbstractSet.java comments: 

 "Checks if this node matches in given position and recursively call
  next node matches on positive self match. Returns positive integer if 
  entire match succeed, negative otherwise
  return -1 if match fails or n > 0;"

In fact method matches() returns not only a positive n > 0. The n is an offset in case of a positive
match attempt. This fact is took into account in all old classes of java.util.regex, but I forgot this fact in SupplRangeSet.java
So I corrected method matches() of the SupplRangeSet class as follows:

...
        int offset = -1;

        if (stringIndex < strLength) {            
            char high = testString.charAt(stringIndex++);
            
            if (contains(high) && 
                    (offset = next.matches(stringIndex, testString, matchResult)) > 0) {
                return offset;
            }
...

Thanks,
Anton

> java.util.regex.Matcher does not support Unicode supplementary characters
> -------------------------------------------------------------------------
>
>                 Key: HARMONY-688
>                 URL: http://issues.apache.org/jira/browse/HARMONY-688
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>            Reporter: Richard Liang
>         Assigned To: Tim Ellison
>         Attachments: patch_src.txt, patch_src_corrected.txt, patch_tests.txt
>
>
> Hello Nikolay,
> The following test case pass on RI, but fail on Harmony.  Would you please have a look at this issue? Thanks a lot.
>     public void test_matcher() {
>         Pattern p = Pattern.compile("\\p{javaLowerCase}");
>         Matcher matcher = p.matcher("\uD801\uDC28");
>         assertTrue(matcher.find());
>     }
> Best regards,
> Richard

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira