You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Hans Zeller (JIRA)" <ji...@apache.org> on 2017/03/01 22:25:45 UTC

[jira] [Commented] (TRAFODION-2515) Question mark instead of Unicode replacement character is used

    [ https://issues.apache.org/jira/browse/TRAFODION-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891191#comment-15891191 ] 

Hans Zeller commented on TRAFODION-2515:
----------------------------------------

The problem I saw in debugging is that we call unicodeTocset() for this invalid UCS-2/UTF-16 string. This in turn calls UTF16ToLocale() without passing a substitution character. In UTF16ToLocale, we call csc_get_subst_char(), but that method expects the substitution character as an input. Why that is I don't know, given that we also pass the character set to this method, so it could look up the substitution character easily (or maybe that would violate some design principle, if the needed lookup table is in a higher layer).

> Question mark instead of Unicode replacement character is used
> --------------------------------------------------------------
>
>                 Key: TRAFODION-2515
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2515
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-general
>    Affects Versions: 2.0-incubating
>            Reporter: Hans Zeller
>            Priority: Minor
>
> When we convert text to a character set and encounter an invalid character, we should translate it into the "replacement character" of that character set. For ASCII and ISO-8859-1, we just use a question mark, since there is not special replacement character. When we convert to Unicode, however, we should use U+FFFD as the replacement character (often displayed as a black diamond with a question mark inside).
> Test case:
> cqd TRANSLATE_ERROR 'off';
> select converttohex(TRANSLATE(_ucs2 X'D8340041' using UCS2toUTF8)) from (values(0))x;
> The source value is an invalid bit pattern followed by "A" (0041). Right now the result shows 3F41 as the output, as Unicode or ASCII text this is "?A". With the correct replacement character, the result should be EFBFBD41, with EFBFBD being the UTF-8 encoding of U+FFFD.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)