You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2020/11/02 02:05:05 UTC
[native-toolchain-CR] IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16688
Change subject: IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0
......................................................................
IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0
After we bump the impala-shell dependent thrift version to 0.11.0, we
hit some bugs in decoding malformed utf8 characters, which crash the
impala-shell or cause it hanging forever. Before we bump the thrift
version, impala-shell is able to print incomplete utf8 characters as
some replaced utf8 symbols, e.g.
impala-shell> select substr("引擎", 1, 4);
引�
impala-shell> select unhex("aa");
�
The cause is that thrift changes its internal strings representation
from bytes to unicode after 0.10 (THRIFT-3503) to support Python3, which
follows the "unicode sandwich" rule -- namely "bytes on the outside,
unicode on the inside, encode/decode at the edges". However, the error
handling method is not specified so we hit the decoding error. We need
patches of THRIFT-2087 and THRIFT-5303 to improve its robustness.
THRIFT-5303 is enough to resolve the issue we hitted since we mostly use
the _fast_decode code path. Backporting THRIFT-2087 as well in case we
use the normal decoding code path somewhere.
Tests:
- Verify the issue is resolved after bumping the impala-shell dependent
thrift version to 0.11.0-p4.
Change-Id: Id16b04248f2db3033bef3ab26b7ba8205768c9af
---
M buildall.sh
A source/thrift/thrift-0.11.0-patches/0003-THRIFT-2087-Python-compiler-replace-non-utf-8-char-w.patch
A source/thrift/thrift-0.11.0-patches/0004-THRIFT-5303-Fix-missing-error-handling-in-using-PyUn.patch
3 files changed, 55 insertions(+), 1 deletion(-)
git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/88/16688/1
--
To view, visit http://gerrit.cloudera.org:8080/16688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id16b04248f2db3033bef3ab26b7ba8205768c9af
Gerrit-Change-Number: 16688
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
[native-toolchain-CR] IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0
Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16688 )
Change subject: IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0
......................................................................
Patch Set 1: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/16688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id16b04248f2db3033bef3ab26b7ba8205768c9af
Gerrit-Change-Number: 16688
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 04 Nov 2020 07:23:35 +0000
Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0
Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16688 )
Change subject: IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0
......................................................................
IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0
After we bump the impala-shell dependent thrift version to 0.11.0, we
hit some bugs in decoding malformed utf8 characters, which crash the
impala-shell or cause it hanging forever. Before we bump the thrift
version, impala-shell is able to print incomplete utf8 characters as
some replaced utf8 symbols, e.g.
impala-shell> select substr("引擎", 1, 4);
引�
impala-shell> select unhex("aa");
�
The cause is that thrift changes its internal strings representation
from bytes to unicode after 0.10 (THRIFT-3503) to support Python3, which
follows the "unicode sandwich" rule -- namely "bytes on the outside,
unicode on the inside, encode/decode at the edges". However, the error
handling method is not specified so we hit the decoding error. We need
patches of THRIFT-2087 and THRIFT-5303 to improve its robustness.
THRIFT-5303 is enough to resolve the issue we hitted since we mostly use
the _fast_decode code path. Backporting THRIFT-2087 as well in case we
use the normal decoding code path somewhere.
Tests:
- Verify the issue is resolved after bumping the impala-shell dependent
thrift version to 0.11.0-p4.
Change-Id: Id16b04248f2db3033bef3ab26b7ba8205768c9af
Reviewed-on: http://gerrit.cloudera.org:8080/16688
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Tested-by: Quanlong Huang <hu...@gmail.com>
---
M buildall.sh
A source/thrift/thrift-0.11.0-patches/0003-THRIFT-2087-Python-compiler-replace-non-utf-8-char-w.patch
A source/thrift/thrift-0.11.0-patches/0004-THRIFT-5303-Fix-missing-error-handling-in-using-PyUn.patch
3 files changed, 55 insertions(+), 1 deletion(-)
Approvals:
Csaba Ringhofer: Looks good to me, approved
Quanlong Huang: Verified
--
To view, visit http://gerrit.cloudera.org:8080/16688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id16b04248f2db3033bef3ab26b7ba8205768c9af
Gerrit-Change-Number: 16688
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
[native-toolchain-CR] IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0
Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16688 )
Change subject: IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0
......................................................................
Patch Set 1: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/16688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id16b04248f2db3033bef3ab26b7ba8205768c9af
Gerrit-Change-Number: 16688
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Nov 2020 16:04:16 +0000
Gerrit-HasComments: No