You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoram Thanga (Code Review)" <ge...@cloudera.org> on 2017/11/29 00:22:05 UTC

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Zoram Thanga has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8349


Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................

IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 166 insertions(+), 77 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8349/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 3
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................

IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Testing: Queries like the following were run on a 1.5-billion row
tpch_parquet.lineitem table, with the old and new implementations
to ensure there is no performance regression:

  1. select count(trim(l_shipinstruct)), count(trim(l_returnflag)), ...
  2. select count(*) from t where trim(l_shipinstruct) = '' and ...

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Reviewed-on: http://gerrit.cloudera.org:8080/8349
Reviewed-by: Michael Ho <kw...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 178 insertions(+), 88 deletions(-)

Approvals:
  Michael Ho: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 13
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8349

to look at the new patch set (#8).

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................

IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Testing: Queries like the following were run on a 1.5-billion row
tpch_parquet.lineitem table, with the old and new implementations
to ensure there is no performance regression:

  1. select count(trim(l_shipinstruct)), count(trim(l_returnflag)), ...
  2. select count(*) from t where trim(l_shipinstruct) = '' and ...

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 175 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8349/8
-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 8
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8349/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/9/be/src/exprs/string-functions-ir.cc@412
PS9, Line 412: context->GetNumArgs() < 2
nit: it seems slightly simpler to say context->GetNumArgs() == 1 given the DCHECK() above.


http://gerrit.cloudera.org:8080/#/c/8349/9/be/src/exprs/string-functions-ir.cc@434
PS9, Line 434: IS_CONSTANT
As discussed, this may not be entirely correct to call this "IS_CONSTANT" as pass false for the case in which chars_to_trim is actually a constant (but not white space). I think this may need to be clarified with a better name.



-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 9
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Jan 2018 22:50:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8349

to look at the new patch set (#6).

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................

IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 166 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8349/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 6
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 7:

(4 comments)

Thanks for the comments. Please see patch set #8

http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc@441
PS7, Line 441: IS_CONSTANT
> We can possibly get rid of this after IMPALA-6380 is fixed.
Agree. Will revisit this aspect of the code.


http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc@443
PS7, Line 443: chars_to_trim.is_null
> As discussed offline, we cannot make any assumption about the length of Str
As discussed, this seems like a bad DCHECK anyway - because '' and NULL are legit values of chars_to_trim, even when the arguments come from column elements. Removing the DCHECK, and instead check for zero-length or NULL chars_to_trim, as these do not modify the target string.

Have added code to check for NULL chars_to_trim in TrimPrepare().


http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h@87
PS7, Line 87:   static void TrimPrepare(FunctionContext*, FunctionContext::FunctionStateScope);
> A quick comment on what this Prepare() function do would be useful.
Done


http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h@149
PS7, Line 149: template <TrimPosition D, bool IS_CONSTANT>
> Please add a comment on what these template parameters stand for.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 7
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:14:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 6:

(4 comments)

Changed implementation to use templates. Also ran perf measurements to check for regressions. Please see patch set #7

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@475
PS4, Line 475: tionContext* context,
> Sure, we can keep passing StringVal(" ") for the no-arg case. It's not used
There are test cases that pass NULL for chars_to_trim, so that's taken care.


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@451
PS6, Line 451: direction == LEADING || direction == BOTH
> To avoid a logic or, we can consider doing:
Changed this to a templatized implementation to avoid branching overhead.


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@464
PS6, Line 464:   //return DoTrimStringImpl(str, unique_chars, direction);
> delete
Done


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@466
PS6, Line 466: 
             : StringVal StringFunctions::Trim(FunctionContext* context, const StringVal& str) {
             :   return DoTrimString(context, str, StringVal(" "), BOTH);
             : }
             : 
             : StringVal StringFunctions::Ltrim(FunctionContext* context, const StringVal& str) {
             :   return DoTrimString(context, str, StringVal(" "), LEADING);
             : }
             : 
             : StringVal StringFunctions::Rtrim(FunctionContext* context, const StringVal& str) {
             :   return DoTrimString(context, str, StringVal(" "), TRAILING);
             : }
> Can you please do a quick benchmark to make sure we didn't regress the perf
Have measured the performance of the new code and old trim/ltrim/rtrim on a tpch_parquet table that has 1.5 billion rows. The test query basically scans all the string columns of the table (8 of them) and calls trim/ltrim/rtrim on each element, where each operation is something like select count(trim(l_shipinstruct)), etc. The idea is to check for any unacceptable perf regression due to how the code has been refactored. Time spent in these function calls aggregated by count()s are the same in the old and new code, within margin of error.



-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 6
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Jan 2018 22:06:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 12: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 12
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Jan 2018 23:44:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 8:

(3 comments)

Looking good. Close to completion. Please find some more comments below.

http://gerrit.cloudera.org:8080/#/c/8349/8/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/8/be/src/exprs/string-functions-ir.cc@410
PS8, Line 410:   if (context->GetNumArgs() < 2) {
Please DCHECK(context->GetNumArgs() == 1 or context->GetNumArgs() == 2);


http://gerrit.cloudera.org:8080/#/c/8349/8/be/src/exprs/string-functions-ir.cc@436
PS8, Line 436:   if (str.len == 0 || chars_to_trim.len == 0 || chars_to_trim.is_null) return str;
Seems that this line can be moved. If str.len == 0, the while loop will take care of it so no need to special case as we don't expect it to be common.

"chars_to_trim" being empty or null doesn't seem to be common case and it doesn't warrant an extra if-check here as the common case is when "chars_to_trim" is constant. Instead, we may want to move if (chars_to_trim.is_null) to line 443 below for the case in which "char_to_trim" is not constant.


http://gerrit.cloudera.org:8080/#/c/8349/8/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

http://gerrit.cloudera.org:8080/#/c/8349/8/be/src/exprs/string-functions.h@153
PS8, Line 153: , flags whether or not the set of
             :   /// characters to be trimmed is fixed, or column elements
is true when the set of characters to trim is constant.



-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 8
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jan 2018 23:17:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 11
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Jan 2018 21:33:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 3:

(1 comment)

Reviving this, after it got bumped down by higher priority tasks. Please take a look at patch set #4.

http://gerrit.cloudera.org:8080/#/c/8349/3/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/3/be/src/exprs/string-functions-ir.cc@443
PS3, Line 443:   bitset<256>* unique_chars = new bitset<256>;
             :   unique_chars->set(static_cast<int>(' '), true);
> It seems unnecessary overhead to do this for every row.
Done. This does mean that the old trim functions now require prepare and close functions.



-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 3
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Dec 2017 00:03:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 10: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1751/


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 10
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Jan 2018 01:26:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@475
PS4, Line 475: tionContext* context,
> If we're going to skip on is_null, then I think we have to stick with passi
Sure, we can keep passing StringVal(" ") for the no-arg case. It's not used anyway.

That said, we still need to skip if chars_to_trim.is_null == true because this essentially means chars_to_trim is undefined. In fact, we should have a test case for it.


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@451
PS6, Line 451: direction == LEADING || direction == BOTH
To avoid a logic or, we can consider doing:

if ((direction & LEADING) != 0) {
}

if ((direction & TRAILING) != 0) {
}

For the case of BOTH, you can pass direction as (LEADING | TRAILING). Of course, that requires redefining LEADING and TRAILING to bitmaps.


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@464
PS6, Line 464:   //return DoTrimStringImpl(str, unique_chars, direction);
delete


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@466
PS6, Line 466: 
             : StringVal StringFunctions::Trim(FunctionContext* context, const StringVal& str) {
             :   return DoTrimString(context, str, StringVal(" "), BOTH);
             : }
             : 
             : StringVal StringFunctions::Ltrim(FunctionContext* context, const StringVal& str) {
             :   return DoTrimString(context, str, StringVal(" "), LEADING);
             : }
             : 
             : StringVal StringFunctions::Rtrim(FunctionContext* context, const StringVal& str) {
             :   return DoTrimString(context, str, StringVal(" "), TRAILING);
             : }
Can you please do a quick benchmark to make sure we didn't regress the perf of these expressions with this change ?

It seems that we almost should templatize StringFunctions::DoTrimString().



-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 6
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Tue, 19 Dec 2017 22:59:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8349

to look at the new patch set (#5).

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................

IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 186 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8349/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 5
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1756/


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 10
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Jan 2018 02:53:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 9:

(2 comments)

Thanks. Please see patch set 10.

http://gerrit.cloudera.org:8080/#/c/8349/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/9/be/src/exprs/string-functions-ir.cc@412
PS9, Line 412: context->GetNumArgs() < 2
> nit: it seems slightly simpler to say context->GetNumArgs() == 1 given the 
Done


http://gerrit.cloudera.org:8080/#/c/8349/9/be/src/exprs/string-functions-ir.cc@434
PS9, Line 434: IS_CONSTANT
> As discussed, this may not be entirely correct to call this "IS_CONSTANT" a
Renamed it as IS_IMPLICIT_WHITESPACE. Hopefully this one sticks.



-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 9
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Jan 2018 19:31:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8349

to look at the new patch set (#9).

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................

IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Testing: Queries like the following were run on a 1.5-billion row
tpch_parquet.lineitem table, with the old and new implementations
to ensure there is no performance regression:

  1. select count(trim(l_shipinstruct)), count(trim(l_returnflag)), ...
  2. select count(*) from t where trim(l_shipinstruct) = '' and ...

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 177 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8349/9
-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 9
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 12: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 12
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Jan 2018 20:20:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1751/


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 10
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Jan 2018 21:50:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 10:

GVO failed due to https://issues.apache.org/jira/browse/IMPALA-6215


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 10
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Jan 2018 02:52:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 10:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1756/


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 10
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Jan 2018 06:42:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8349

to look at the new patch set (#7).

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................

IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Testing: Queries like the following were run on a 1.5-billion row
tpch_parquet.lineitem table, with the old and new implementations
to ensure there is no performance regression:

  1. select count(trim(l_shipinstruct)), count(trim(l_returnflag)), ...
  2. select count(*) from t where trim(l_shipinstruct) = '' and ...

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 169 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8349/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 7
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8349

to look at the new patch set (#4).

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................

IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 173 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8349/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 4
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 10
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Jan 2018 21:49:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc@441
PS7, Line 441: IS_CONSTANT
We can possibly get rid of this after IMPALA-6380 is fixed.


http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions-ir.cc@443
PS7, Line 443: chars_to_trim.is_null
As discussed offline, we cannot make any assumption about the length of StringVal which is null.


http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h@87
PS7, Line 87:   static void TrimPrepare(FunctionContext*, FunctionContext::FunctionStateScope);
A quick comment on what this Prepare() function do would be useful.


http://gerrit.cloudera.org:8080/#/c/8349/7/be/src/exprs/string-functions.h@149
PS7, Line 149: template <TrimPosition D, bool IS_CONSTANT>
Please add a comment on what these template parameters stand for.



-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 7
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Jan 2018 00:48:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 6:

> Uploaded patch set 5.

Please see patch set #6 instead. I forgot to delete DoTrimStringImpl() and rectified that in #6


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 6
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Fri, 15 Dec 2017 21:28:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8349

to look at the new patch set (#10).

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................

IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of
characters.

This patch generalizes ltrim()/rtrim() functions to accept a second
argument that specifies the set of characters to be removed from the
leading/trailing end of the target string:

ltrim(string text[, characters text])
rtrim(string text[, characters text])

A common string trimming method has been added to StringFunctions,
which is called from the general ltrim/rtrim/btrim functions. The
functions also share prepare and close operations.

New StringFunctions tests have been added to ExprTest for the new
forms of ltrim() and rtrim(). New tests to cover handling of special
characters have also been added.

Note that our string handling functions only work with the ASCII
character set. Handling of other character sets lies outside the
scope of this patch.

The existing ltrim()/rtrim()/trim() functions that take only one
argument have been updated to use the more general methods.

Testing: Queries like the following were run on a 1.5-billion row
tpch_parquet.lineitem table, with the old and new implementations
to ensure there is no performance regression:

  1. select count(trim(l_shipinstruct)), count(trim(l_returnflag)), ...
  2. select count(*) from t where trim(l_shipinstruct) = '' and ...

Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
4 files changed, 178 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8349/10
-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 10
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 9:

(3 comments)

Thanks. Please see patch set 9.

http://gerrit.cloudera.org:8080/#/c/8349/8/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/8/be/src/exprs/string-functions-ir.cc@410
PS8, Line 410:   // There can be either 1 or 2 arguments.
> Please DCHECK(context->GetNumArgs() == 1 or context->GetNumArgs() == 2);
Done


http://gerrit.cloudera.org:8080/#/c/8349/8/be/src/exprs/string-functions-ir.cc@436
PS8, Line 436:     const StringVal& str, const StringVal& chars_to_trim) {
> Seems that this line can be moved. If str.len == 0, the while loop will tak
Done


http://gerrit.cloudera.org:8080/#/c/8349/8/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

http://gerrit.cloudera.org:8080/#/c/8349/8/be/src/exprs/string-functions.h@153
PS8, Line 153: , is true when the set of characters
             :   /// to trim is constant.
> is true when the set of characters to trim is constant.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 9
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Jan 2018 00:19:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@406
PS4, Line 406: StringFunctions::Ltrim
nit: Can you please group these 3 functions close to *TrimString variant so for better legibility ?


http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@411
PS4, Line 411: static_cast<const StringVal&>(" ")
This seems unnecessary now. Can we just pass StringVal::null() in this case ?


http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@418
PS4, Line 418: new bitset<256>()
context->Allocate<>() ?


http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@439
PS4, Line 439: delete unique_chars;
context->Free();


http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@443
PS4, Line 443: DoTrimStringImpl
Seems that this function can be merged with DoTrimString().


http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@475
PS4, Line 475: chars_to_trim.is_null
Shouldn't we skip if chars_to_trim.is_null ? We cannot make assumption about len if is_null is true.



-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 4
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Fri, 15 Dec 2017 07:57:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1787/


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 12
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Jan 2018 20:21:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 11:

Let's wait for https://gerrit.cloudera.org/#/c/9079/ to be merged before rebasing, so that we're not again held up by a flaky test failing.


-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 11
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Jan 2018 21:36:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

Posted by "Zoram Thanga (Code Review)" <ge...@cloudera.org>.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
......................................................................


Patch Set 4:

(6 comments)

Thanks for the comments. Uploading patch set #5.

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@406
PS4, Line 406: StringFunctions::Ltrim
> nit: Can you please group these 3 functions close to *TrimString variant so
Done


http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@411
PS4, Line 411: static_cast<const StringVal&>(" ")
> This seems unnecessary now. Can we just pass StringVal::null() in this case
Can do this, but please see response to comment on line 475. Retaining this for now, albeit in a simpler form.


http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@418
PS4, Line 418: new bitset<256>()
> context->Allocate<>() ?
Allocate<>() returns an uninitialized buffer. Can't use this. Else I'l have to explicitly call bitset.reset() on the returned unique_chars. Seems to me it's a lot simpler to just use the existing new/delete combination and let the bitset constructor do the initialization.


http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@439
PS4, Line 439: delete unique_chars;
> context->Free();
Please see response to previous comment.


http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@443
PS4, Line 443: DoTrimStringImpl
> Seems that this function can be merged with DoTrimString().
Done


http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@475
PS4, Line 475: chars_to_trim.is_null
> Shouldn't we skip if chars_to_trim.is_null ? We cannot make assumption abou
If we're going to skip on is_null, then I think we have to stick with passing StringVal(" ") from the no-args trim functions. Otherwise it's going to be hard to make a distinction on whether we want to trim spaces, or the chars_to_trim is null, so there's no work to be done.



-- 
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 4
Gerrit-Owner: Zoram Thanga <zo...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Zoram Thanga <zo...@cloudera.com>
Gerrit-Comment-Date: Fri, 15 Dec 2017 20:35:32 +0000
Gerrit-HasComments: Yes