You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@commons.apache.org by "Henri Yandell (JIRA)" <ji...@apache.org> on 2011/07/02 04:53:28 UTC

[jira] [Updated] (LANG-708) StringEscapeUtils.escapeEcmaScript from lang3 cuts off long unicode string

     [ https://issues.apache.org/jira/browse/LANG-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell updated LANG-708:
-------------------------------

    Fix Version/s: 3.x

> StringEscapeUtils.escapeEcmaScript from lang3 cuts off long unicode string
> --------------------------------------------------------------------------
>
>                 Key: LANG-708
>                 URL: https://issues.apache.org/jira/browse/LANG-708
>             Project: Commons Lang
>          Issue Type: Bug
>            Reporter: anton
>             Fix For: 3.x
>
>         Attachments: Test.java, input.txt
>
>
> Hello, I have a really big JSON string (generated from db) with unicode chars and I need to pass it though StringEscapeUtils.escapeEcmaScript(value). This string is generated and in most cases it works ok, but I have met a specific string (attached below) which is not correctly converted - few symbols (about 10) at the end of the string are cut-off (and actually they are not already unicode chars).
> the original string ends with:
>  "geonameId":6544329,"valueCode":""}]
> and the produced string ends with:
>  \"geonameId\":6544329,\"value
> So Code":""}] part is missing and this does not allow to parse the result as JSON on the client side.
> I have tried to debug a bit with StringEscapeUtils.escapeEcmaScript source code and is seems that the problem is somewhere around here:
> CharSequenceTranslator.translate(...){
> ...
>         int sz = Character.codePointCount(input, 0, input.length());
>         for (int i = 0; i < sz; i++) {
>             // consumed is the number of codepoints consumed
>             int consumed = translate(input, i, out);
>             if(consumed == 0) { 
>                 out.write( Character.toChars( Character.codePointAt(input, i) ) );
>             }
> ...
> }
> If I put breakpoint condition to stop in the loop when i==(sz-5), I can see that the last chars of "valueCode" literal are being added to the end of "out" stream, but the counter condition ends too early to reach the end of original input String.
> So, it seems that somehow with the provided string either the sz value is calculated incorrectly or the processing loop did wrong counter adjustmes at some point.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira