You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by "Jesper Steen Møller (JIRA)" <ji...@apache.org> on 2015/08/11 14:39:46 UTC

[jira] [Comment Edited] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

    [ https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681738#comment-14681738 ] 

Jesper Steen Møller edited comment on XALANJ-2419 at 8/11/15 12:38 PM:
-----------------------------------------------------------------------

I followed the instructions on https://xalan.apache.org/xalan-j/downloads.html#buildmyself on my Mac OS X 10.10.4 with Xcode developer tools installed.
I had to add execute permissions on test/build.sh, and temporarily change my locale to "All American" (or the test "Extension test of javaSample3.xsl" fails)

That worked, and I got 2 x CONGRATULATIONS when running "./build.sh smoketest"

I then applied the patch containing the tests (using svn patch), and then *ToStreamTest.runTest()* and *StreamResultAPITest.runTest()* both failed, as was expected.

I then applied the fix, and the tests were once again OK.

So, yes, the fix still applies.

This was on Java 1.7.

Hope this helps!


was (Author: jespersm):
I followed the instructions on https://xalan.apache.org/xalan-j/downloads.html#buildmyself on my Mac OS X 10.10.4 with Xcode developer tools installed.
I had to add execute permissions on test/build.sh, and temporarily change my locale to "All American" (or the test "Extension test of javaSample3.xsl" fails)

That worked, and I got 2 x CONGRATULATIONS

I then applied the tests-patch (using svn patch), and then ToStreamTest.runTest() and StreamResultAPITest.runTest() both failed, as was expected.

I then applied the fix, and the tests were once again OK.

So, yes, the fix still applies.

This was on Java 1.7.

Hope this helps!

> Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8
> ---------------------------------------------------------------------------------------------
>
>                 Key: XALANJ-2419
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2419
>             Project: XalanJ2
>          Issue Type: Bug
>          Components: Serialization
>    Affects Versions: 2.7.1
>            Reporter: Henri Sivonen
>         Attachments: XALANJ-2419-fix.txt, XALANJ-2419-tests.txt
>
>
> org.apache.xml.serializer.ToStream contains the following code:
>                     else if (m_encodingInfo.isInEncoding(ch)) {
>                         // If the character is in the encoding, and
>                         // not in the normal ASCII range, we also
>                         // just leave it get added on to the clean characters
>                         
>                     }
>                     else {
>                         // This is a fallback plan, we should never get here
>                         // but if the character wasn't previously handled
>                         // (i.e. isn't in the encoding, etc.) then what
>                         // should we do?  We choose to write out an entity
>                         writeOutCleanChars(chars, i, lastDirtyCharProcessed);
>                         writer.write("&#");
>                         writer.write(Integer.toString(ch));
>                         writer.write(';');
>                         lastDirtyCharProcessed = i;
>                     }
> This leads to the wrong (latter) if branch running for surrogates, because isInEncoding() for UTF-8 returns false for surrogates. It is always wrong (regardless of encoding) to escape a surrogate as an NCR.
> The practical effect of this bug is that any document with astral characters in it ends up in an ill-formed serialization and does not parse back using an XML parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xalan.apache.org
For additional commands, e-mail: dev-help@xalan.apache.org