You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2021/01/04 05:11:48 UTC

[Impala-ASF-CR] [WIP] IMPALA-2019(Part-1): Provide UTF-8 support in length, sub string and reverse functions

Quanlong Huang has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16908 )

Change subject: [WIP] IMPALA-2019(Part-1): Provide UTF-8 support in length, sub_string and reverse functions
......................................................................

[WIP] IMPALA-2019(Part-1): Provide UTF-8 support in length, sub_string and reverse functions

Add query option, UTF8_MODE, for turning on/off the UTF-8 aware
behavior. Add UTF-8 aware versions of length(), sub_string() and
reverse(). Other functions will be added in later patches.

Implementation:
Introduces a new flag, is_utf8, in type descriptors. Pass it down from
FE when analyzing FunctionCallExpr. Check it in string functions which
support UTF-8 mode.

Tests:
 - Expose the UTF-8 aware version of string functions as builtin
   functions (named by utf8_*). Add be tests for them.
 - Add e2e tests for the UTF8_MODE query option.
 - TODO: Add a table with utf-8 strings and use string functions on it

Change-Id: I0aaf3544e89f8a3d531ad6afe056b3658b525b7c
---
M be/src/exprs/anyval-util.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/udf/udf-internal.h
M be/src/udf/udf.cc
M be/src/udf/udf.h
M be/src/util/bit-util.h
M common/function-registry/impala_functions.py
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
A testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
A tests/query_test/test_utf8_strings.py
21 files changed, 335 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16908/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16908
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0aaf3544e89f8a3d531ad6afe056b3658b525b7c
Gerrit-Change-Number: 16908
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>