You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2020/10/30 12:20:00 UTC
[jira] [Commented] (THRIFT-5303) Unicode decode errors in
_fast_decode
[ https://issues.apache.org/jira/browse/THRIFT-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223611#comment-17223611 ]
Quanlong Huang commented on THRIFT-5303:
----------------------------------------
PR for the fix: [https://github.com/apache/thrift/pull/2269]
Verified that It fixes the issue in Impala client.
> Unicode decode errors in _fast_decode
> -------------------------------------
>
> Key: THRIFT-5303
> URL: https://issues.apache.org/jira/browse/THRIFT-5303
> Project: Thrift
> Issue Type: Bug
> Components: Python - Library
> Affects Versions: 0.11.0
> Environment: Ubuntu 16.04.6 LTS
> Reporter: Quanlong Huang
> Priority: Major
>
> Impala currently uses thrift-0.11.0 on client side and thrift-0.9.3 on server side (server side upgrade is blocked by some issues). We encountered an issue in decoding utf8 bytes on the client side. The result has a partial utf8 code point. But thrift is not handling the error elegantly. The stacktrace:
> {code:java}
> Traceback (most recent call last):
> File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1210, in _do_beeswax_rpc
> ret = rpc()
> File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1113, in <lambda>
> self.fetch_size))
> File "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py", line 254, in fetch
> return self.recv_fetch()
> File "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py", line 275, in recv_fetch
> result.read(iprot)
> File "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py", line 1410, in read
> iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xe6 in position 3: unexpected end of data {code}
> This is similar to THRIFT-2087, but the error happens in the boundary between Python and C++ codes. Just like THRIFT-2087, we need to provide an error handling behavior of decoding utf-8 bytes in {{TBinaryProtocolAccelerated._fast_decode}}. The related codes are [https://github.com/apache/thrift/blob/0.11.0/lib/py/src/ext/protocol.tcc#L708]
> {code:c++}
> case T_STRING: {
> char* buf = NULL;
> int len = impl()->readString(&buf);
> if (len < 0) {
> return NULL;
> }
> if (isUtf8(typeargs)) {
> return PyUnicode_DecodeUTF8(buf, len, 0); <--- Needs to provide an error handling method here
> } else {
> return PyBytes_FromStringAndSize(buf, len);
> }
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)