You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@commons.apache.org by "Gary D. Gregory (JIRA)" <ji...@apache.org> on 2011/07/14 19:01:00 UTC

[jira] [Commented] (LANG-720) StringEscapeUtils.escapeXml(input) outputs wrong results when an input contains characters in Supplementary Planes.

    [ https://issues.apache.org/jira/browse/LANG-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065387#comment-13065387 ] 

Gary D. Gregory commented on LANG-720:
--------------------------------------

The patch does not break any unit test with the latest from SVN but it is missing a unit test.

Perhaps we should hold off since we are in the middle of a VOTE.

> StringEscapeUtils.escapeXml(input) outputs wrong results when an input contains characters in Supplementary Planes.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-720
>                 URL: https://issues.apache.org/jira/browse/LANG-720
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*, lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Taro Yabuki
>              Labels: patch
>         Attachments: CharSequenceTranslator.java.20110714.diff
>
>
> Hello.
> I use StringEscapeUtils.escapeXml(input) to escape special characters for XML.
> This method outputs wrong results when input contains characters in Supplementary Planes.
> String str1 = "\uD842\uDFB7" + "A";
> String str2 = StringEscapeUtils.escapeXml(str1);
> // The value of str2 must be equal to the one of str1,
> // because str1 does not contain characters to be escaped.
> // However, str2 is diffrent from str1.
> System.out.println(URLEncoder.encode(str1, "UTF-16BE")); //%D8%42%DF%B7A
> System.out.println(URLEncoder.encode(str2, "UTF-16BE")); //%D8%42%DF%B7%FF%FD
> The cause of this problem is that the loop to translate input character by character is wrong.
> In CharSequenceTranslator.translate(CharSequence input, Writer out),
> loop counter "i" moves from 0 to Character.codePointCount(input, 0, input.length()),
> but it should move from 0 to input.length().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira