You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@commons.apache.org by Xeno Amess <xe...@gmail.com> on 2020/04/28 21:04:45 UTC

[commons-lang3] potential bug in CharSequenceUtils?

well when I look at StringUtil I found something like this.

final char c1 = cs.charAt(index1++);
final char c2 = substring.charAt(index2++);

if (c1 == c2) {
    continue;
}

if (!ignoreCase) {
    return false;
}

// The same check as in String.regionMatches():
if (Character.toUpperCase(c1) != Character.toUpperCase(c2)
        && Character.toLowerCase(c1) != Character.toLowerCase(c2)) {
    return false;
}

But it actually is not quite same to what in String.regionMatches.
the code part in String.regionMatches. in JKD8 is actually

char c1 = ta[to++];
char c2 = pa[po++];
if (c1 == c2) {
    continue;
}
if (ignoreCase) {
    // If characters don't match but case may be ignored,
    // try converting both characters to uppercase.
    // If the results match, then the comparison scan should
    // continue.
    char u1 = Character.toUpperCase(c1);
    char u2 = Character.toUpperCase(c2);
    if (u1 == u2) {
        continue;
    }
    // Unfortunately, conversion to uppercase does not work properly
    // for the Georgian alphabet, which has strange rules about case
    // conversion.  So we need to make one last check before
    // exiting.
    if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
        continue;
    }
}

see, the chars to invoke Character.toLowerCase is actually u1 and u2, but
according to logic  in CharSequenceUtils they should be c1 and c2.
If they are functional equal, then why oracle guys create the two variables
u1 and u2? That is a waste of time then.
So I think it might be a bug.
But me myself know nothing about Georgian.
Is there anybody familiar with Georgian alphabet and willing to do further
debug about this?

Re: [commons-lang3] potential bug in CharSequenceUtils?

Posted by Xeno Amess <xe...@gmail.com>.

yes it is really a bug.
I created a fix pr (with test codes) at
https://github.com/apache/commons-lang/pull/529
check in it when you guys have time.


Xeno Amess <xe...@gmail.com> 于2020年4月29日周三 上午5:04写道：

> well when I look at StringUtil I found something like this.
>
> final char c1 = cs.charAt(index1++);
> final char c2 = substring.charAt(index2++);
>
> if (c1 == c2) {
>     continue;
> }
>
> if (!ignoreCase) {
>     return false;
> }
>
> // The same check as in String.regionMatches():
> if (Character.toUpperCase(c1) != Character.toUpperCase(c2)
>         && Character.toLowerCase(c1) != Character.toLowerCase(c2)) {
>     return false;
> }
>
> But it actually is not quite same to what in String.regionMatches.
> the code part in String.regionMatches. in JKD8 is actually
>
> char c1 = ta[to++];
> char c2 = pa[po++];
> if (c1 == c2) {
>     continue;
> }
> if (ignoreCase) {
>     // If characters don't match but case may be ignored,
>     // try converting both characters to uppercase.
>     // If the results match, then the comparison scan should
>     // continue.
>     char u1 = Character.toUpperCase(c1);
>     char u2 = Character.toUpperCase(c2);
>     if (u1 == u2) {
>         continue;
>     }
>     // Unfortunately, conversion to uppercase does not work properly
>     // for the Georgian alphabet, which has strange rules about case
>     // conversion.  So we need to make one last check before
>     // exiting.
>     if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
>         continue;
>     }
> }
>
> see, the chars to invoke Character.toLowerCase is actually u1 and u2, but
> according to logic  in CharSequenceUtils they should be c1 and c2.
> If they are functional equal, then why oracle guys create the two
> variables u1 and u2? That is a waste of time then.
> So I think it might be a bug.
> But me myself know nothing about Georgian.
> Is there anybody familiar with Georgian alphabet and willing to do further
> debug about this?
>
>
>

Re: [commons-lang3] potential bug in CharSequenceUtils?

Posted by Xeno Amess <xe...@gmail.com>.

yes it is really a bug.
I created a fix pr (with test codes) at
https://github.com/apache/commons-lang/pull/529
check in it when you guys have time.


Xeno Amess <xe...@gmail.com> 于2020年4月29日周三 上午5:04写道：

> well when I look at StringUtil I found something like this.
>
> final char c1 = cs.charAt(index1++);
> final char c2 = substring.charAt(index2++);
>
> if (c1 == c2) {
>     continue;
> }
>
> if (!ignoreCase) {
>     return false;
> }
>
> // The same check as in String.regionMatches():
> if (Character.toUpperCase(c1) != Character.toUpperCase(c2)
>         && Character.toLowerCase(c1) != Character.toLowerCase(c2)) {
>     return false;
> }
>
> But it actually is not quite same to what in String.regionMatches.
> the code part in String.regionMatches. in JKD8 is actually
>
> char c1 = ta[to++];
> char c2 = pa[po++];
> if (c1 == c2) {
>     continue;
> }
> if (ignoreCase) {
>     // If characters don't match but case may be ignored,
>     // try converting both characters to uppercase.
>     // If the results match, then the comparison scan should
>     // continue.
>     char u1 = Character.toUpperCase(c1);
>     char u2 = Character.toUpperCase(c2);
>     if (u1 == u2) {
>         continue;
>     }
>     // Unfortunately, conversion to uppercase does not work properly
>     // for the Georgian alphabet, which has strange rules about case
>     // conversion.  So we need to make one last check before
>     // exiting.
>     if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
>         continue;
>     }
> }
>
> see, the chars to invoke Character.toLowerCase is actually u1 and u2, but
> according to logic  in CharSequenceUtils they should be c1 and c2.
> If they are functional equal, then why oracle guys create the two
> variables u1 and u2? That is a waste of time then.
> So I think it might be a bug.
> But me myself know nothing about Georgian.
> Is there anybody familiar with Georgian alphabet and willing to do further
> debug about this?
>
>
>