You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Michael Ryan (JIRA)" <ji...@apache.org> on 2018/09/05 17:31:00 UTC

[jira] [Comment Edited] (LANG-1406) StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase

    [ https://issues.apache.org/jira/browse/LANG-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604694#comment-16604694 ] 

Michael Ryan edited comment on LANG-1406 at 9/5/18 5:30 PM:
------------------------------------------------------------

I've been thinking - how do case-insensitive regular expressions handle this? Theoretically these should do the same thing:
{code}
StringUtils.replaceIgnoreCase("\u0130x", "x", "");
Pattern.compile("x", Pattern.CASE_INSENSITIVE).matcher("\u0130x").replaceAll("");
{code}
The Matcher.replaceAll(String) method does not throw an exception.

So what is the difference? The Pattern.newSingle(int) method is the key thing to look at. It uses Character.toUpperCase(char) and Character.toLowerCase(char), which do not have the same behavior as String.toUpperCase() and String.toLowerCase(). The Character class produces a single character.

So I think a possible naive solution to this would be to call Character.toLowerCase() on each character in the String and then append the characters together into a new String.
{code}
String text = "foo";
char[] chars = text.toCharArray();
for (int i = 0; i < chars.length; i++) {
    chars[i] = Character.toLowerCase(chars[i]);
}
String lowerText = new String(chars);
{code}


was (Author: michaelryan):
I've been thinking - how do case-insensitive regular expressions handle this? Theoretically these should do the same thing:
{code}
StringUtils.replaceIgnoreCase("\u0130x", "x", "");
Pattern.compile("x", Pattern.CASE_INSENSITIVE).matcher("\u0130x").replaceAll("");
{code}
The Matcher.replaceAll(String) method does not throw an exception.

So what is the difference? The Pattern.newSingle(int) method is the key thing to look at. It uses Character.toUpperCase(char) and Character.toLowerCase(char), which do not have the same behavior as String.toUpperCase() and String.toLowerCase(). The Character class produce a single character.

So I think a possible naive solution to this would be to call Character.toLowerCase() on each character in the String and then append the characters together into a new String.
{code}
String text = "foo";
char[] chars = text.toCharArray();
for (int i = 0; i < chars.length; i++) {
    chars[i] = Character.toLowerCase(chars[i]);
}
String lowerText = new String(chars);
{code}

> StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase
> ----------------------------------------------------------------
>
>                 Key: LANG-1406
>                 URL: https://issues.apache.org/jira/browse/LANG-1406
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>            Reporter: Michael Ryan
>            Priority: Major
>
> STEPS TO REPRODUCE:
> {code}
> StringUtils.replaceIgnoreCase("\u0130x", "x", "")
> {code}
> EXPECTED: "\u0130" is returned.
> ACTUAL: StringIndexOutOfBoundsException
> This happens because the replace method is assuming that text.length() == text.toLowerCase().length(), which is not true for certain characters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)