You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chris Nauroth (JIRA)" <ji...@apache.org> on 2013/07/30 19:39:51 UTC

[jira] [Commented] (HADOOP-9801) Configuration#writeXml uses platform defaulting encoding, which may mishandle multi-byte characters.

    [ https://issues.apache.org/jira/browse/HADOOP-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724141#comment-13724141 ] 

Chris Nauroth commented on HADOOP-9801:
---------------------------------------

Thanks to [~daijy] for finding and reporting this bug via Hive testing on Windows, where the default encoding is CP-1252.
                
> Configuration#writeXml uses platform defaulting encoding, which may mishandle multi-byte characters.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9801
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9801
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 3.0.0, 1-win, 2.1.0-beta, 1.3.0
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>
> The overload of {{Configuration#writeXml}} that accepts an {{OutputStream}} does not set encoding explicitly, so it chooses the platform default encoding.  Depending on the platform's default encoding, this can cause incorrect output data when encoding multi-byte characters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira