You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@commons.apache.org by "Matt Benson (JIRA)" <ji...@apache.org> on 2011/07/14 20:51:59 UTC

[jira] [Resolved] (LANG-720) StringEscapeUtils.escapeXml(input) outputs wrong results when an input contains characters in Supplementary Planes.

     [ https://issues.apache.org/jira/browse/LANG-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Benson resolved LANG-720.
------------------------------

    Resolution: Fixed

I was also going to ask for a unit test, but wanted to improve my understanding of the situation anyway, so adapted the posted problem code.  Even though we are currently voting on the release of 3.0.0 from RC4 I don't see why we can't fix this in trunk; the RC tag is already cut.  I have used the concept of the patch to rewrite the entire method in question, primarily to avoid the modification of a counter variable within a for loop.

Committed revision 1146844.

> StringEscapeUtils.escapeXml(input) outputs wrong results when an input contains characters in Supplementary Planes.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-720
>                 URL: https://issues.apache.org/jira/browse/LANG-720
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*, lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Taro Yabuki
>              Labels: patch
>         Attachments: CharSequenceTranslator.java.20110714.diff
>
>
> Hello.
> I use StringEscapeUtils.escapeXml(input) to escape special characters for XML.
> This method outputs wrong results when input contains characters in Supplementary Planes.
> String str1 = "\uD842\uDFB7" + "A";
> String str2 = StringEscapeUtils.escapeXml(str1);
> // The value of str2 must be equal to the one of str1,
> // because str1 does not contain characters to be escaped.
> // However, str2 is diffrent from str1.
> System.out.println(URLEncoder.encode(str1, "UTF-16BE")); //%D8%42%DF%B7A
> System.out.println(URLEncoder.encode(str2, "UTF-16BE")); //%D8%42%DF%B7%FF%FD
> The cause of this problem is that the loop to translate input character by character is wrong.
> In CharSequenceTranslator.translate(CharSequence input, Writer out),
> loop counter "i" moves from 0 to Character.codePointCount(input, 0, input.length()),
> but it should move from 0 to input.length().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira