You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2016/05/14 16:45:13 UTC

[jira] [Commented] (XALANJ-2593) Incorrect showing of supplementary characters in attributes

    [ https://issues.apache.org/jira/browse/XALANJ-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283613#comment-15283613 ] 

Hudson commented on XALANJ-2593:
--------------------------------

FAILURE: Integrated in axiom-trunk #2714 (See [https://builds.apache.org/job/axiom-trunk/2714/])
Delegate conversion of unmappable characters to character references to the XmlWriter. This also ensures that we handle the scenario described in XALANJ-2593 correctly. (veithen: rev 1743836)
* axiom/aspects/core-aspects/src/main/java/org/apache/axiom/core/stream/serializer/ToStream.java
* axiom/aspects/core-aspects/src/main/java/org/apache/axiom/core/stream/serializer/writer/AbstractXmlWriter.java
* axiom/aspects/core-aspects/src/main/java/org/apache/axiom/core/stream/serializer/writer/UnmappableCharacterHandler.java
* axiom/aspects/core-aspects/src/test/java/org/apache/axiom/core/stream/serializer/SerializerTest.java


> Incorrect showing of supplementary characters in attributes
> -----------------------------------------------------------
>
>                 Key: XALANJ-2593
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2593
>             Project: XalanJ2
>          Issue Type: Bug
>      Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects.  Anybody can view the issue.) 
>          Components: Serialization
>    Affects Versions: 2.7.2
>         Environment: Win 7 x64, Java 1.6 
>            Reporter: Eugene Shkel
>            Assignee: Steven J. Hathaway
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In Xalan 2.7.2 the supplementary characters (see http://www.oracle.com/technetwork/articles/javase/supplementary-142654.html for details) shown incorrectly in attributes .
> For example, I need to show symbols 𣎴 (& # 144308 ; ) or 𠘨 (& # 132648 ; ) in attribute "y" of element "x"
> Expected result: {code}<?xml version="1.0" encoding="UTF-8"?><x y="&#144308; - &#132648;"/>{code}
> Actual result for Xalan 2.7.2 is:{code} <?xml version="1.0" encoding="UTF-8"?><x y="&#55372;&#57268; - &#55361;&#56872;"/>{code}
> Code snippet for test:
> {code}
> public static void main(String[] argv) throws Exception {
>         TransformerFactory tFactory = TransformerFactory.newInstance();
>         StreamSource stylesource = new StreamSource(new StringReader("<?xml version=\"1.0\" encoding=\"UTF-8\"?><xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\" ><xsl:template match=\"/\"><x y=\"{xslt/search/value1}\" /></xsl:template></xsl:stylesheet>"));
>         Transformer transformer = tFactory.newTransformer(stylesource);
>         StreamSource source = new StreamSource(new StringReader("<?xml version=\"1.0\"?><xslt><search><value1>𣎴 - 𠘨</value1></search></xslt>"));
>         Result result = new StreamResult(System.out);
>         transformer.transform(source, result);
>     } 
> {code}
> The problem relates to the method org.apache.xml.serializer.ToStream.writeAttrString(Writer, String, String). 
> {code}
>             if (m_charInfo.shouldMapAttrChar(ch)) {
>                 // The character is supposed to be replaced by a String
>                 // e.g.   '&'  -->  "&amp;"
>                 // e.g.   '<'  -->  "&lt;"
>                 accumDefaultEscape(writer, ch, i, stringChars, len, false, true);
>             }
> {code}
> this part doesn't process multicharacter sequences like supplementary characters within Java platform and this leads to executing next part within same method
> {code}
>             else {
>                     // This is a fallback plan, we should never get here
>                     // but if the character wasn't previously handled
>                     // (i.e. isn't in the encoding, etc.) then what
>                     // should we do?  We choose to write out a character ref
>                     writer.write("!13&#");
>                     writer.write(Integer.toString(ch));
>                     writer.write(';');
>                 }
> {code}
>  PS: Can't add patch file, so put here.
> {code}
> --- src\org\apache\xml\serializer\ToStream.java	2014-03-26 17:21:30 +0200
> +++ src\org\apache\xml\serializer\ToStream.java	2014-09-09 19:09:30 +0300
> @@ -2112,8 +2112,13 @@
>                  // e.g.   '&'  -->  "&amp;"
>                  // e.g.   '<'  -->  "&lt;"
>                  accumDefaultEscape(writer, ch, i, stringChars, len, false, true);
> -            }
> -            else {
> +            } else if (Encodings.isHighUTF16Surrogate(ch)) {
> +                // more than single input character can be processed
> +                // within accumDefaultEscape()
> +                // so we set appropriate value for loop for().
> +                i = accumDefaultEscape(writer, ch, i, stringChars, len, false, true); 
> +
> +            } else {
>                  if (0x0 <= ch && ch <= 0x1F) {
>                      // Range 0x00 through 0x1F inclusive
>                      // This covers the non-whitespace control characters
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xalan.apache.org
For additional commands, e-mail: dev-help@xalan.apache.org