You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary Gregory (JIRA)" <ji...@apache.org> on 2010/03/15 21:29:27 UTC

[jira] Updated: (LANG-607) StringUtils methods incorrectly matches Unicode 2.0+ supplementary characters.

     [ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Gregory updated LANG-607:
------------------------------

    Summary: StringUtils methods incorrectly matches Unicode 2.0+ supplementary characters.  (was: StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.)

Renaming ticket to fix this issue in other SU methods.

> StringUtils methods incorrectly matches Unicode 2.0+ supplementary characters.
> ------------------------------------------------------------------------------
>
>                 Key: LANG-607
>                 URL: https://issues.apache.org/jira/browse/LANG-607
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.5
>         Environment: java version "1.6.0_16"
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
> Microsoft Windows [Version 6.0.6002]
> Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
> Java version: 1.6.0_16
> Java home: C:\Program Files\Java\jdk1.6.0_16\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows"
>            Reporter: Gary Gregory
>            Assignee: Gary Gregory
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: LANG-607.diff
>
>
> StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters.
> For example, define a test fixture to be the Unicode character U+20000 where U+20000 is written in Java source as "\uD840\uDC00"
> 	private static final String CharU20000 = "\uD840\uDC00";
> 	private static final String CharU20001 = "\uD840\uDC01";
> You can see Unicode supplementary characters correctly implemented in the JRE call:
> 	assertEquals(-1, CharU20000.indexOf(CharU20001));
> But this is broken:
> 	assertEquals(false, StringUtils.containsAny(CharU20000, CharU20001));
> 	assertEquals(false, StringUtils.containsAny(CharU20001, CharU20000));
> This is fine:
> 	assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20000));
> 	assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20001));
> 	assertEquals(true, StringUtils.contains(CharU20000, CharU20000));
> 	assertEquals(false, StringUtils.contains(CharU20000, CharU20001));
> because the method calls the JRE to perform the match.
> More than you want to know:
> - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.