You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary Gregory (JIRA)" <ji...@apache.org> on 2010/03/14 01:40:27 UTC
[jira] Created: (LANG-607) StringUtils.containsAny methods
incorrectly matches Unicode 2.0+ supplementary characters.
StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.
------------------------------------------------------------------------------------------
Key: LANG-607
URL: https://issues.apache.org/jira/browse/LANG-607
Project: Commons Lang
Issue Type: Bug
Components: lang.*
Affects Versions: 2.5
Environment: java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
Microsoft Windows [Version 6.0.6002]
Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
Java version: 1.6.0_16
Java home: C:\Program Files\Java\jdk1.6.0_16\jre
Default locale: en_US, platform encoding: Cp1252
OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows"
Reporter: Gary Gregory
Assignee: Gary Gregory
Priority: Minor
Fix For: 3.0
StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.
For example, define a test fixture to be the Unicode character U+20000 where U+20000 is written in Java source as "\uD840\uDC00"
private static final String CharU20000 = "\uD840\uDC00";
private static final String CharU20001 = "\uD840\uDC01";
You can see Unicode supplementary characters correctly implemented in the JRE call:
assertEquals(-1, CharU20000.indexOf(CharU20001));
But this is broken:
assertEquals(false, StringUtils.containsAny(CharU20000, CharU20001));
assertEquals(false, StringUtils.containsAny(CharU20001, CharU20000));
This is fine:
assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20000));
assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20001));
assertEquals(true, StringUtils.contains(CharU20000, CharU20000));
assertEquals(false, StringUtils.contains(CharU20000, CharU20001));
because the method calls the JRE to perform the match.
More than you want to know:
- http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (LANG-607) StringUtils methods do not handle
Unicode 2.0+ supplementary characters correctly.
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881100#action_12881100 ]
Henri Yandell commented on LANG-607:
------------------------------------
Noting that fixing this isn't a change in binary compatibly; so we can release without it. That said - needs working on.
> StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.
> ----------------------------------------------------------------------------------
>
> Key: LANG-607
> URL: https://issues.apache.org/jira/browse/LANG-607
> Project: Commons Lang
> Issue Type: Bug
> Components: lang.*
> Affects Versions: 2.5
> Environment: java version "1.6.0_16"
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
> Microsoft Windows [Version 6.0.6002]
> Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
> Java version: 1.6.0_16
> Java home: C:\Program Files\Java\jdk1.6.0_16\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows"
> Reporter: Gary Gregory
> Assignee: Gary Gregory
> Priority: Minor
> Fix For: 3.0
>
> Attachments: LANG-607.diff
>
>
> StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.
> For example, define a test fixture to be the Unicode character U+20000 where U+20000 is written in Java source as "\uD840\uDC00"
> private static final String CharU20000 = "\uD840\uDC00";
> private static final String CharU20001 = "\uD840\uDC01";
> You can see Unicode supplementary characters correctly implemented in the JRE call:
> assertEquals(-1, CharU20000.indexOf(CharU20001));
> But this is broken:
> assertEquals(false, StringUtils.containsAny(CharU20000, CharU20001));
> assertEquals(false, StringUtils.containsAny(CharU20001, CharU20000));
> This is fine:
> assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20000));
> assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20001));
> assertEquals(true, StringUtils.contains(CharU20000, CharU20000));
> assertEquals(false, StringUtils.contains(CharU20000, CharU20001));
> because the method calls the JRE to perform the match.
> More than you want to know:
> - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (LANG-607) StringUtils methods incorrectly matches
Unicode 2.0+ supplementary characters.
Posted by "Gary Gregory (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Gregory updated LANG-607:
------------------------------
Summary: StringUtils methods incorrectly matches Unicode 2.0+ supplementary characters. (was: StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.)
Renaming ticket to fix this issue in other SU methods.
> StringUtils methods incorrectly matches Unicode 2.0+ supplementary characters.
> ------------------------------------------------------------------------------
>
> Key: LANG-607
> URL: https://issues.apache.org/jira/browse/LANG-607
> Project: Commons Lang
> Issue Type: Bug
> Components: lang.*
> Affects Versions: 2.5
> Environment: java version "1.6.0_16"
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
> Microsoft Windows [Version 6.0.6002]
> Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
> Java version: 1.6.0_16
> Java home: C:\Program Files\Java\jdk1.6.0_16\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows"
> Reporter: Gary Gregory
> Assignee: Gary Gregory
> Priority: Minor
> Fix For: 3.0
>
> Attachments: LANG-607.diff
>
>
> StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.
> For example, define a test fixture to be the Unicode character U+20000 where U+20000 is written in Java source as "\uD840\uDC00"
> private static final String CharU20000 = "\uD840\uDC00";
> private static final String CharU20001 = "\uD840\uDC01";
> You can see Unicode supplementary characters correctly implemented in the JRE call:
> assertEquals(-1, CharU20000.indexOf(CharU20001));
> But this is broken:
> assertEquals(false, StringUtils.containsAny(CharU20000, CharU20001));
> assertEquals(false, StringUtils.containsAny(CharU20001, CharU20000));
> This is fine:
> assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20000));
> assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20001));
> assertEquals(true, StringUtils.contains(CharU20000, CharU20000));
> assertEquals(false, StringUtils.contains(CharU20000, CharU20001));
> because the method calls the JRE to perform the match.
> More than you want to know:
> - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (LANG-607) StringUtils methods do not handle
Unicode 2.0+ supplementary characters correctly.
Posted by "Gary Gregory (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Gregory updated LANG-607:
------------------------------
Summary: StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly. (was: StringUtils methods incorrectly matches Unicode 2.0+ supplementary characters.)
> StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.
> ----------------------------------------------------------------------------------
>
> Key: LANG-607
> URL: https://issues.apache.org/jira/browse/LANG-607
> Project: Commons Lang
> Issue Type: Bug
> Components: lang.*
> Affects Versions: 2.5
> Environment: java version "1.6.0_16"
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
> Microsoft Windows [Version 6.0.6002]
> Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
> Java version: 1.6.0_16
> Java home: C:\Program Files\Java\jdk1.6.0_16\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows"
> Reporter: Gary Gregory
> Assignee: Gary Gregory
> Priority: Minor
> Fix For: 3.0
>
> Attachments: LANG-607.diff
>
>
> StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.
> For example, define a test fixture to be the Unicode character U+20000 where U+20000 is written in Java source as "\uD840\uDC00"
> private static final String CharU20000 = "\uD840\uDC00";
> private static final String CharU20001 = "\uD840\uDC01";
> You can see Unicode supplementary characters correctly implemented in the JRE call:
> assertEquals(-1, CharU20000.indexOf(CharU20001));
> But this is broken:
> assertEquals(false, StringUtils.containsAny(CharU20000, CharU20001));
> assertEquals(false, StringUtils.containsAny(CharU20001, CharU20000));
> This is fine:
> assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20000));
> assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20001));
> assertEquals(true, StringUtils.contains(CharU20000, CharU20000));
> assertEquals(false, StringUtils.contains(CharU20000, CharU20001));
> because the method calls the JRE to perform the match.
> More than you want to know:
> - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (LANG-607) StringUtils.containsAny methods
incorrectly matches Unicode 2.0+ supplementary characters.
Posted by "Gary Gregory (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Gregory updated LANG-607:
------------------------------
Attachment: LANG-607.diff
Attaching patch for the record.
> StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.
> ------------------------------------------------------------------------------------------
>
> Key: LANG-607
> URL: https://issues.apache.org/jira/browse/LANG-607
> Project: Commons Lang
> Issue Type: Bug
> Components: lang.*
> Affects Versions: 2.5
> Environment: java version "1.6.0_16"
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
> Microsoft Windows [Version 6.0.6002]
> Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
> Java version: 1.6.0_16
> Java home: C:\Program Files\Java\jdk1.6.0_16\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows"
> Reporter: Gary Gregory
> Assignee: Gary Gregory
> Priority: Minor
> Fix For: 3.0
>
> Attachments: LANG-607.diff
>
>
> StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.
> For example, define a test fixture to be the Unicode character U+20000 where U+20000 is written in Java source as "\uD840\uDC00"
> private static final String CharU20000 = "\uD840\uDC00";
> private static final String CharU20001 = "\uD840\uDC01";
> You can see Unicode supplementary characters correctly implemented in the JRE call:
> assertEquals(-1, CharU20000.indexOf(CharU20001));
> But this is broken:
> assertEquals(false, StringUtils.containsAny(CharU20000, CharU20001));
> assertEquals(false, StringUtils.containsAny(CharU20001, CharU20000));
> This is fine:
> assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20000));
> assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20001));
> assertEquals(true, StringUtils.contains(CharU20000, CharU20000));
> assertEquals(false, StringUtils.contains(CharU20000, CharU20001));
> because the method calls the JRE to perform the match.
> More than you want to know:
> - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.