You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2016/05/14 16:45:13 UTC
[jira] [Commented] (XALANJ-2593) Incorrect showing of supplementary
characters in attributes
[ https://issues.apache.org/jira/browse/XALANJ-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283613#comment-15283613 ]
Hudson commented on XALANJ-2593:
--------------------------------
FAILURE: Integrated in axiom-trunk #2714 (See [https://builds.apache.org/job/axiom-trunk/2714/])
Delegate conversion of unmappable characters to character references to the XmlWriter. This also ensures that we handle the scenario described in XALANJ-2593 correctly. (veithen: rev 1743836)
* axiom/aspects/core-aspects/src/main/java/org/apache/axiom/core/stream/serializer/ToStream.java
* axiom/aspects/core-aspects/src/main/java/org/apache/axiom/core/stream/serializer/writer/AbstractXmlWriter.java
* axiom/aspects/core-aspects/src/main/java/org/apache/axiom/core/stream/serializer/writer/UnmappableCharacterHandler.java
* axiom/aspects/core-aspects/src/test/java/org/apache/axiom/core/stream/serializer/SerializerTest.java
> Incorrect showing of supplementary characters in attributes
> -----------------------------------------------------------
>
> Key: XALANJ-2593
> URL: https://issues.apache.org/jira/browse/XALANJ-2593
> Project: XalanJ2
> Issue Type: Bug
> Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
> Components: Serialization
> Affects Versions: 2.7.2
> Environment: Win 7 x64, Java 1.6
> Reporter: Eugene Shkel
> Assignee: Steven J. Hathaway
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> In Xalan 2.7.2 the supplementary characters (see http://www.oracle.com/technetwork/articles/javase/supplementary-142654.html for details) shown incorrectly in attributes .
> For example, I need to show symbols 𣎴 (& # 144308 ; ) or 𠘨 (& # 132648 ; ) in attribute "y" of element "x"
> Expected result: {code}<?xml version="1.0" encoding="UTF-8"?><x y="𣎴 - 𠘨"/>{code}
> Actual result for Xalan 2.7.2 is:{code} <?xml version="1.0" encoding="UTF-8"?><x y="�� - ��"/>{code}
> Code snippet for test:
> {code}
> public static void main(String[] argv) throws Exception {
> TransformerFactory tFactory = TransformerFactory.newInstance();
> StreamSource stylesource = new StreamSource(new StringReader("<?xml version=\"1.0\" encoding=\"UTF-8\"?><xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\" ><xsl:template match=\"/\"><x y=\"{xslt/search/value1}\" /></xsl:template></xsl:stylesheet>"));
> Transformer transformer = tFactory.newTransformer(stylesource);
> StreamSource source = new StreamSource(new StringReader("<?xml version=\"1.0\"?><xslt><search><value1>𣎴 - 𠘨</value1></search></xslt>"));
> Result result = new StreamResult(System.out);
> transformer.transform(source, result);
> }
> {code}
> The problem relates to the method org.apache.xml.serializer.ToStream.writeAttrString(Writer, String, String).
> {code}
> if (m_charInfo.shouldMapAttrChar(ch)) {
> // The character is supposed to be replaced by a String
> // e.g. '&' --> "&"
> // e.g. '<' --> "<"
> accumDefaultEscape(writer, ch, i, stringChars, len, false, true);
> }
> {code}
> this part doesn't process multicharacter sequences like supplementary characters within Java platform and this leads to executing next part within same method
> {code}
> else {
> // This is a fallback plan, we should never get here
> // but if the character wasn't previously handled
> // (i.e. isn't in the encoding, etc.) then what
> // should we do? We choose to write out a character ref
> writer.write("!13&#");
> writer.write(Integer.toString(ch));
> writer.write(';');
> }
> {code}
> PS: Can't add patch file, so put here.
> {code}
> --- src\org\apache\xml\serializer\ToStream.java 2014-03-26 17:21:30 +0200
> +++ src\org\apache\xml\serializer\ToStream.java 2014-09-09 19:09:30 +0300
> @@ -2112,8 +2112,13 @@
> // e.g. '&' --> "&"
> // e.g. '<' --> "<"
> accumDefaultEscape(writer, ch, i, stringChars, len, false, true);
> - }
> - else {
> + } else if (Encodings.isHighUTF16Surrogate(ch)) {
> + // more than single input character can be processed
> + // within accumDefaultEscape()
> + // so we set appropriate value for loop for().
> + i = accumDefaultEscape(writer, ch, i, stringChars, len, false, true);
> +
> + } else {
> if (0x0 <= ch && ch <= 0x1F) {
> // Range 0x00 through 0x1F inclusive
> // This covers the non-whitespace control characters
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xalan.apache.org
For additional commands, e-mail: dev-help@xalan.apache.org