You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/03/13 02:25:00 UTC

[jira] [Commented] (IMPALA-10523) Impala-shell crash in printing error messages that contain UTF-8 characters

    [ https://issues.apache.org/jira/browse/IMPALA-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300687#comment-17300687 ] 

ASF subversion and git services commented on IMPALA-10523:
----------------------------------------------------------

Commit d5f67fce41a919b9904c70e53740d8bfd5831e71 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d5f67fc ]

IMPALA-10523: Fix impala-shell crash in printing error messages that contain UTF-8 characters

In Python2, print() converts all non-keyword arguments to strings like
str() does and writes them to the stream. str() on QueryStateException
returns its value(i.e. error message) which could be in unicode type.
Python2 will implicitly encode it to str type using the default
encoding, 'ascii'. This could result in UnicodeEncodeError when there
are non-ascii characters in the error message.

This patch explicitly encodes the error message using 'utf-8' encoding
if it's in unicode type and the shell is run in Python2.

Tests:
 - Add test in test_shell_interactive.py

Change-Id: Ie10f5b03ecc5877053c2fbada1afaf256b423a71
Reviewed-on: http://gerrit.cloudera.org:8080/17099
Reviewed-by: Tamas Mate <tm...@cloudera.com>
Reviewed-by: Laszlo Gaal <la...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Impala-shell crash in printing error messages that contain UTF-8 characters
> ---------------------------------------------------------------------------
>
>                 Key: IMPALA-10523
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10523
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Clients
>    Affects Versions: Impala 4.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> Encounter a crash in impala-shell when playing around with a query:
> {code}
> [localhost:21050] default> select cast(now() as string format 'yyyy年MM月dd日');
> Query: select cast(now() as string format 'yyyy年MM月dd日')
> Query submitted at: 2021-02-20 16:40:09 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=6c4d64dec01254bc:d54107fd00000000
> Traceback (most recent call last):
>   File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 2070, in <module>
>     impala_shell_main()
>   File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 2035, in impala_shell_main
>     shell.cmdloop(intro)
>   File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/cmd.py", line 142, in cmdloop
>     stop = self.onecmd(line)
>   File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 697, in onecmd
>     return func(arg)
>   File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 1123, in do_select
>     return self._execute_stmt(query_str, print_web_link=True)
>   File "/home/quanlong/workspace/Impala/shell/impala_shell.py", line 1320, in _execute_stmt
>     print(e, file=sys.stderr)
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u5e74' in position 44: ordinal not in range(128)
> {code}
> The crash point is in shell/impala_shell.py:1320
> {code:python}
> 1316     except QueryStateException as e:
> 1317       # an exception occurred while executing the query
> 1318       if self.last_query_handle is not None:
> 1319         self.imp_client.close_query(self.last_query_handle)
> 1320       print(e, file=sys.stderr)
> {code}
> Definition of QueryStateException in shell/shell_exceptions.py:
> {code:python}
>  28 class QueryStateException(Exception):
>  29   def __init__(self, value=""):
>  30     self.value = value
>  31 
>  32   def __str__(self):
>  33     return self.value
> {code}
> After IMPALA-9489, '{{value}}' of QueryStateException is in unicode type when using Python2, because we follow the "unicode sandwich" manner - "bytes on the outside, unicode on the inside, encode/decode at the edges". We should encode it to str using 'utf-8' encoding, instead of letting Python do this implicitly and fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org