You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Moxuan Shi (Jira)" <ji...@apache.org> on 2019/12/02 03:11:00 UTC
[jira] [Commented] (IMPALA-9205) UDF function in impala recieved
chinese character change to???
[ https://issues.apache.org/jira/browse/IMPALA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985784#comment-16985784 ]
Moxuan Shi commented on IMPALA-9205:
------------------------------------
why substring support non-ASCII?
{code:java}
+-----------------------+
| substring(name, 1, 3) |
+-----------------------+
| 汪 |
| 华 |
+-----------------------+
{code}
[~jbapple]
> UDF function in impala recieved chinese character change to???
> --------------------------------------------------------------
>
> Key: IMPALA-9205
> URL: https://issues.apache.org/jira/browse/IMPALA-9205
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 2.12.0
> Environment: CentOS 7.3
> Hive 1.2
> Impala 2.12
> Java JDK 1.8
> Python 2.7.5
> Reporter: Moxuan Shi
> Priority: Major
>
> UDF works in hive, but not in impala.
>
> {code:java}
> select leftcutcontentudf("一二三",2);
> OK
> 一二
> {code}
> [work in hive|https://i.stack.imgur.com/pdCzU.png]
>
> {code:java}
> select leftcutcontentudf("一二三",2);
> +----------------------------------------+
> | default.leftcutcontentudf('一二三', 2) |
> +----------------------------------------+
> | ?? |
> +----------------------------------------+
> {code}
> [chinese character changed to ?? in impala|https://i.stack.imgur.com/QU5Gx.png]
>
> I make a new UDF to print byte for input String
> {code:java}
> public class GetBytes extends UDF {
> public String evaluate(String input) {
> byte[] bytes = input.getBytes();
> StringBuffer stringBuffer = new StringBuffer();
> for (byte b : bytes){
> stringBuffer.append(b).append(" ");
> }
> return stringBuffer.toString();
> }
> }
> {code}
> it seems that the chinese character changed to ??? before calling UDF function.
> {code:java}
> select getbytes("一二三");
> {code}
> {code:java}
> +-----------------------------+
> | default.getbytes('一二三') |
> +-----------------------------+
> | 63 63 63 63 63 63 63 63 63 |
> +-----------------------------+
> {code}
> [GetBytes result|https://i.stack.imgur.com/wVHT6.png]
>
> but normal query is correct in impala.
> {code:java}
> select khmc_62c57e8ae0ac from collective_2085;
> +-------------------+
> | khmc_62c57e8ae0ac |
> +-------------------+
> | 淘宝 |
> +-------------------+
> {code}
> [correct query|https://i.stack.imgur.com/euq79.png]
> how to deal with this problem?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org