You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Youwei Wang (Code Review)" <ge...@cloudera.org> on 2016/09/20 03:08:52 UTC

[Impala-ASF-CR] IMPALA-889: Add support for ISO-SQL trim()

Youwei Wang has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4474

Change subject: IMPALA-889: Add support for ISO-SQL trim()
......................................................................

IMPALA-889: Add support for ISO-SQL trim()

Add support for an ISO-SQL compliant trim() function.
Form #1: Impala UDF calling
  Syntax: SELECT TRIM(where, characters, string_to_be_trimmed);
  where: enumerate values, available choices are: left/leading/right/trailing/both:
    left/leading means trimming characters from the start of the source string;
    right/trailing means trimming characters from the end of the source string;
    both means trimming characters from both ends of the source string;
  Note: option is case-insensitive, which means 'left' equals 'LeFt'.
  characters: the characters to trim, which is represented as a string;
  string_to_be_trimmed: the source string to trim;
  Examples:
  select btrim('left', 'a%', 'abc%%defg%%%%%'); returns 'bc%%defg%%%%%';
  select btrim('right', 'fg%', 'abc%%defg%%%%%'); returns 'abc%%de';
  select btrim('leading', 'ab%', 'abc%%defg%%%%%'); returns 'c%%defg%%%%%';
  select btrim('trailing', 'bfg%', 'abc%%defg%%%%%'); returns 'abc%%de';
  select btrim('both', 'abfg%', 'abc%%defg%%%%%'); returns 'c%%de';

Form #2: Standard SQL syntax (Core SQL feature ID E021-09)
  Syntax: SELECT TRIM(where characters FROM string_to_be_trimmed);
  where: enumerate values, available choices are: leading/trailing/both:
  Note: option is case-insensitive, which means 'leading' equals 'LeADinG'.
  Note: left and right are Impala keywords, they are not available in this syntax.
  characters: the characters to trim, which is represented as a string;
  string_to_be_trimmed: the source string to trim;
  Examples:
  select btrim(leading 'ab%' from 'abc%%defg%%%%%'); returns 'c%%defg%%%%%';
  select btrim(trailing 'bfg%' from 'abc%%defg%%%%%'); returns 'abc%%de';
  select btrim(both 'abfg%' from 'abc%%defg%%%%%'); returns 'c%%de';

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/com/cloudera/impala/analysis/TrimExpr.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeExprsTest.java
7 files changed, 244 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................


Patch Set 6:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10:   specified direction(s) of a STRING value.
> Then change the name of the new function away from btrim to something else.
Greetings, Jim.
This old function name "btrim" has been discarded. 
And it has been replaced using the new one "trim".
Thank you.


http://gerrit.cloudera.org:8080/#/c/4474/6//COMMIT_MSG
Commit Message:

PS6, Line 14: Syntax #3 confirms the standard SQL syntax (Core SQL feature ID
> All of them are part of the standard, as is simply TRIM(string_to_be_trimme
Done


PS6, Line 18: |
> just use regular prose here: Valid options are "leading", "trailing", and "
Done


PS6, Line 22: NULL
> How is a NULL passed? Is it literally just "trim(NULL from 'foo')"?
Greetings, Jim.
Yes, it is exact as you have said.
But I have to say: "trim(NULL from 'foo')" is invalid.
The correct calling form is like: trim(NULL 'f' from 'foo');


PS6, Line 22: empty argument
> What does "empty argument" mean in this case?
Greetings, Jim. 
"empty argument" means an empty string like: "" or ''.
Extra explaination has been added. 
Thank you for pointing out this. :)


PS6, Line 23: returns
> not just "returns", "TRIM returns". Here and below.
Greetings, Jim.
Thank you for pointing out that. :)


PS6, Line 25: "trim_char_set": Case-sensitive characters to be removed. This
> Please separate paragraphs by a blank line.
Done


PS6, Line 29: " "
> Is the standard space or all whitespace, including tabs, newlines, etc.?
Greetings, Jim.
I mean the standard space here.
This comment has been fixed.
Thank you. :)


PS6, Line 32: Blank
> How is a "blank" argument different than an "empty argument" above?
Greetings, Jim.
No, "Blank" means nothing provided.
For example: select trim(both " " from );
This is the case of "blank" argument.
As you can see here, this definitely causes parsing error.

Thank you.


PS6, Line 33: ote: For Syntax #1, since no "characters" is specified, it trims
            : " "(space) by default. For Syntax #2, since no "where" is specified,
            : the option both is implied by default.
> Please split this up and put it above in the text on each option.
Done


PS6, Line 44: @&@&@&@
> This is hard to read. Can you stick to alphanumeric characters in these exa
Greetings, Jim. 
Thank you for pointing out that. 
Corresponding tests will be modified according to your guideline.


http://gerrit.cloudera.org:8080/#/c/4474/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1977:   TestStringValue("trim(leading FROM '     &@&127+  &@   ')", "&@&127+  &@   ");
> Please try to have each test case test one thing different from every other
Done


http://gerrit.cloudera.org:8080/#/c/4474/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 788: static inline bitset<256>* TrimInternalPrepare(FunctionContext* ctx,
> Please use unnamed namespaces; see https://google.github.io/styleguide/cppg
Done


PS6, Line 805: bitset
> const
Done


PS6, Line 806: &
> Out parameters, when used, are not references. https://google.github.io/sty
Done


PS6, Line 806: begin
> prefer to return, not pass out parameters.
Done


PS6, Line 807: static_cast<int>
> What type is str.ptr[begin]? What type does test take? Do you need this cas
Greetings, Jim.
The type of str.ptr[begin] is uint8_t. 
I believe we don't need this case in this case.


PS6, Line 807: test
> Please use [], not test - we don't need the bounds checking.
Done


PS6, Line 813: int &
> begin is not modified.
Done


PS6, Line 822: bitset
> const
Done


PS6, Line 824: int32_t
> Don't use int32_t and int interchangeably without a specific reason
Done


PS6, Line 835: str
> The original functions made a new string val, rather than returning this on
Greetings, Jim.
I guess it is related to the dynamic memory management.
Am I right? :)


Line 848:     option.ptr[i] = std::tolower(option.ptr[i]);
> Though awkward, please insert the required casts here.
Done


PS6, Line 851: TrimOption
> This should be done at parsing/analysis time, for efficiency reasons
Done


PS6, Line 855: INVALID
> LEADING = 1, TRAILING = 2, BOTH = LEADING | TRAILING, INVALID = ~BOTH. The 
Done


PS6, Line 858: opt_
> https://google.github.io/styleguide/cppguide.html#Variable_Names
Done


PS6, Line 861: ( 
> Please use clang-format to format your code.
Done


PS6, Line 861: 7
> try to avoid repeating literals, like 7. Maybe set static constexpr char LE
Done


PS6, Line 863: both
> bad copy
Done


PS6, Line 866: both
> bad copy
Done


http://gerrit.cloudera.org:8080/#/c/4474/1/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

PS1, Line 123: Base6
> And while you're changing the name, no need to call this BTrim.
Greetings, Jim.
I mean no offense and no harm. 
But it seems you are looking at some old change I uploaded previously. The latest change is here: 
https://gerrit.cloudera.org/#/c/4474/6/be/src/exprs/string-functions.h

Sincerely hope this link is helpful for your review. 
Thank you.


http://gerrit.cloudera.org:8080/#/c/4474/1/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

PS1, Line 478: ezza 
> This is the string you should change. Change it to something else, like sql
Greetings, Jim.
I mean no offense and no harm. But it seems you are looking at some old change I uploaded previously. The latest change is here: 
https://gerrit.cloudera.org/#/c/4474/6/common/function-registry/impala_functions.py

To be straightforward, the latest function entry is like:
[['trim'], 'STRING', ['STRING', 'STRING', 'STRING'], 'impala::StringFunctions::TrimStringWithOption',   '_ZN6impala15StringFunctions12BTrimPrepareEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE',  '_ZN6impala15StringFunctions10BTrimCloseEPN10impala_udf15FunctionContextENS2_18FunctionStateScopeE'],

I don't think we can put this entry in the invisible part since doing this could make this function entry lose connection to the front-end call.

Please feel free to point out my mistake if possible. Thank you.


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has uploaded a new patch set (#8).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Purpose: Removes all instances of one or more characters from the
  specified direction(s) of a STRING value.
Syntax #1 TRIM(where FROM string_to_be_trimmed);
Syntax #2 TRIM(trim_char_set FROM string_to_be_trimmed);
Syntax #3 TRIM(where trim_char_set FROM string_to_be_trimmed);
All syntaxes confirm the standard SQL syntax (Core SQL feature ID
  E021-09).

"where": Case-insensitive trim direction. Valid options are
  leading, trailing, and both. leading means trimming characters
  from the start; trailing means trimming characters from the end;
  both means trimming characters from both sides. These options
  should be provided without single/double quotation marks since they
  are not string literals but identifiers. NULL or empty
  argument("" or '') implies "both" option. If an invalid option is
  specified, TRIM returns target untouched. For Syntax #2, since no
  "where" is specified, the option both is implied by default.

"trim_char_set": Case-sensitive characters to be removed. This
  argument is regarded as a character set going to be removed. So the
  occurrence order of each character doesn't matter and duplicated
  instance of same character will be ignored. NULL argument implies
  " "(standard space) by default. Empty argument("" or '') makes TRIM
  return "string_to_be_trimmed" untouched. For Syntax #1, since no
  "trim_char_set" is specified, it trims " "(standard space) by
  default.

"target": Case-sensitive target string to trim. This argument can be
  NULL. Blank argument causes parsing error.

Return type: string

Examples:
Syntax #1:
trim(both 'abfg' from 'abcdefg'); Result: 'cde';
trim(leading FROM ' 123 '); Result: '123 ';
trim(trailing FROM ' 123 '); Result: ' 123';
Syntax #2:
trim('xyz' FROM 'zyxabczyx')"; Result: "abc";
Syntax #3:
trim(leading 'ab' from 'abcdefg'); Result: 'cdefg';
trim(trailing 'bfg%' from 'abcdefg'); Result: 'abcde';
trim(both 'abfg' from 'abcdefg')"; Result "cde";

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/TrimExpr.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
9 files changed, 406 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4474/3//COMMIT_MSG
Commit Message:

Line 9: Purpose: Removes all instances of one or more characters from the
> We should definitely not invent a new function name and stick to trim(). Th
Done


Line 32:   NULL. Blank argument causes parsing error.
> The 'where' and 'trim_char_set' clauses are optional, so the following are 
Done


http://gerrit.cloudera.org:8080/#/c/4474/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 2418:   {: RESULT = c; :}
> Not clear what this rule adds. The copy+paste nature of it means an increas
Done


Line 2449: 
> My understanding is that several of the trim() clauses are optional, e.g., 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has uploaded a new patch set (#4).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Purpose: Removes all instances of one or more characters from the
  specified direction(s) of a STRING value.
Syntax #1 TRIM(where FROM string_to_be_trimmed);
Syntax #2 TRIM(trim_char_set FROM string_to_be_trimmed);
Syntax #3 TRIM(where trim_char_set FROM string_to_be_trimmed);
Syntax #3 confirms the standard SQL syntax (Core SQL feature ID
  E021-09).

"where": Case-insensitive trim direction. Valid options are:
  leading|trailing|both. leading means trimming characters from the
  start; trailing means trimming characters from the end; both means
  trimming characters from both sides. These options should be
  provided without single/double quotation marks since they are not
  string literals but identifiers. NULL or empty argument implies
  "both" option. If an invalid option is specified, returns target
  untouched.
"trim_char_set": Case-sensitive characters to be removed. This
  argument is regarded as a character set going to be removed. So the
  occurrence order of each character doesn't matter and duplicated
  instance of same character will be ignored. NULL argument implies
  " "(space) by default. Empty argument return string_to_be_trimmed
  untouched.
"target": Case-sensitive target string to trim. This argument can be
  NULL. Blank argument causes parsing error.
Note: For Syntax #1, since no "characters" is specified, it trims
" "(space) by default. For Syntax #2, since no "where" is specified,
the option both is implied by default.
Return type: string

Examples:
Syntax #1:
trim(both 'abfg%' from 'abc%%defg%%%%%'); returns 'c%%de';
trim(leading FROM ' 123 '); returns '123 ';
trim(trailing FROM ' 123 '); returns ' 123';
Syntax #2:
trim('&@' FROM '&@&@&@&127+1-@&@&@&@')"; returns "127+1-";
Syntax #3:
trim(leading 'ab%' from 'abc%%defg%%%%%'); returns 'c%%defg%%%%%';
trim(trailing 'bfg%' from 'abc%%defg%%%%%'); returns 'abc%%de';
trim(both 'abfg%' from 'abc%%defg%%%%%')"; returns "c%%de";

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
A common/function-registry/string-functions-ir.cc
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/com/cloudera/impala/analysis/TrimExpr.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeExprsTest.java
10 files changed, 1,439 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10:   Purpose: Removes all instances of one or more characters from the
> Please look at invisible_functions, line 672 of that file.
Greetings, Jim.
Thank you so much for providing this great idea. However, I am afraid I must say this approach doesn't work as it is intended to be. If I move the btrim entry to the invisible_functions part, both two syntaxes are unavailable now for these two syntaxes share the same entry. I have verified this by experiments. Please feel free to point out my mistake if you think I am wrong. Thank you. :)


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-889: Add support for ISO-SQL trim()

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has uploaded a new patch set (#2).

Change subject: IMPALA-889: Add support for ISO-SQL trim()
......................................................................

IMPALA-889: Add support for ISO-SQL trim()

Add support for an ISO-SQL compliant trim() function.
Form 1: Impala UDF calling
  Syntax: BTRIM(where, characters, string_to_be_trimmed);
  "where": an enumerate value denoting the trim direction.
    Available choices are: 'left|leading|right|trailing|both'.
    This field is case-insensitive, which means 'left' equals 'LeFt'.
    left|leading - trimming characters from the start of the source string;
    right|trailing - trimming characters from the end of the source string;
    both - trimming characters from both ends of the source string;
  "characters": the characters to trim, which are represented as a string.
    The order of such characters doesn't matter. Multiple occurrances of
    one same letter will be ignored. This field is case-sensitive.
  "string_to_be_trimmed": the source string to trim. This field is
    case-sensitive.

  Examples:
  btrim('left', 'a%', 'abc%%defg%%%%%'); returns 'bc%%defg%%%%%';
  btrim('right', 'fg%', 'abc%%defg%%%%%'); returns 'abc%%de';
  btrim('leading', 'ab%', 'abc%%defg%%%%%'); returns 'c%%defg%%%%%';
  btrim('trailing', 'bfg%', 'abc%%defg%%%%%'); returns 'abc%%de';
  btrim('both', 'abfg%', 'abc%%defg%%%%%'); returns 'c%%de';

Form 2: Standard SQL syntax (Core SQL feature ID E021-09)
  Syntax: BTRIM(where characters FROM string_to_be_trimmed);
  "where": this field has the same meaning as form 1.
    Available choices are: leading/trailing/both. Since left and right are
    Impala keywords, they are not available in this form.
  "characters": this field has the same meaning as form 1.
  "string_to_be_trimmed": this field has the same meaning as form 1.

  Examples:
  btrim(leading 'ab%' from 'abc%%defg%%%%%'); returns 'c%%defg%%%%%';
  btrim(trailing 'bfg%' from 'abc%%defg%%%%%'); returns 'abc%%de';
  btrim(both 'abfg%' from 'abc%%defg%%%%%'); returns 'c%%de';

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/com/cloudera/impala/analysis/TrimExpr.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeExprsTest.java
8 files changed, 295 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................


Patch Set 8:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/4474/8//COMMIT_MSG
Commit Message:

Line 14: All syntaxes confirm the standard SQL syntax (Core SQL feature ID
Please describe the existing syntax, too, and say that it already works.


PS8, Line 18: trailing
Please capitalize these.


PS8, Line 18: leading
Please try to match our existing formatting choices. This includes breaking lines at the 70-character mark, not arbitrarily early as in line 17.


PS8, Line 19:   
Do not indent lines after the first one in a paragraph. Please use git log to study what previous commit messages looked like.


PS8, Line 23: option
this word is not needed.


PS8, Line 22: empty
            :   argument("" or '')
just say "the empty string"


PS8, Line 24: Syntax
Why is this word capitalized, here and below


PS8, Line 24: target
"target" was not used earlier. Do you mean string_to_be_trimmed?


PS8, Line 31: (standard space)
You don't have to say "standard space" here or below. I was asking about what the standard says. Are you sure it just says " "? What do postgres and mysql do?


PS8, Line 43: abcdefg
Make the parameter strings even shorter


PS8, Line 46: Syntax #2:
Please separate these sections by blank lines.


PS8, Line 51: "
What is this doing here? Same above. Please try to be more consciencious about these things.


http://gerrit.cloudera.org:8080/#/c/4474/8/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS8, Line 1977:  
Make the test cases even smaller.


http://gerrit.cloudera.org:8080/#/c/4474/8/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS8, Line 799: static
static in an unnamed namespace is redundant.


PS8, Line 817: static
Put this in the unnamed namespace


PS8, Line 817: *
&


PS8, Line 818: begin
Is this always 0? Here and below


PS8, Line 838: int begin = 0;
             :   begin = 
Make this one line


PS8, Line 849: StringVal
Does this do the right thing is is_null is true but ptr and len are set?


PS8, Line 862: option.ptr[i]
incorrect argument type


PS8, Line 865: trim_option
This is still not done at parsing/analysis time. That work happens in the front-end.


Line 893:   if (trim_option == TrimOption::LEADING || trim_option == TrimOption::BOTH) {
trim_option | TrimOption::LEADING


http://gerrit.cloudera.org:8080/#/c/4474/8/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

Line 429:   [['trim'], 'STRING', ['STRING'], 'impala::StringFunctions::Trim'],
Is this line still needed, or is it covered by the new grammar rules?


Line 430:   [['trim'], 'STRING', ['STRING', 'STRING', 'STRING'], 'impala::StringFunctions::TrimStringWithOption',
Did you try changing this name and making it invisible? What happens?


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has uploaded a new patch set (#5).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Purpose: Removes all instances of one or more characters from the
  specified direction(s) of a STRING value.
Syntax #1 TRIM(where FROM string_to_be_trimmed);
Syntax #2 TRIM(trim_char_set FROM string_to_be_trimmed);
Syntax #3 TRIM(where trim_char_set FROM string_to_be_trimmed);
Syntax #3 confirms the standard SQL syntax (Core SQL feature ID
  E021-09).

"where": Case-insensitive trim direction. Valid options are:
  leading|trailing|both. leading means trimming characters from the
  start; trailing means trimming characters from the end; both means
  trimming characters from both sides. These options should be
  provided without single/double quotation marks since they are not
  string literals but identifiers. NULL or empty argument implies
  "both" option. If an invalid option is specified, returns target
  untouched.
"trim_char_set": Case-sensitive characters to be removed. This
  argument is regarded as a character set going to be removed. So the
  occurrence order of each character doesn't matter and duplicated
  instance of same character will be ignored. NULL argument implies
  " "(space) by default. Empty argument return string_to_be_trimmed
  untouched.
"target": Case-sensitive target string to trim. This argument can be
  NULL. Blank argument causes parsing error.
Note: For Syntax #1, since no "characters" is specified, it trims
" "(space) by default. For Syntax #2, since no "where" is specified,
the option both is implied by default.
Return type: string

Examples:
Syntax #1:
trim(both 'abfg%' from 'abc%%defg%%%%%'); returns 'c%%de';
trim(leading FROM ' 123 '); returns '123 ';
trim(trailing FROM ' 123 '); returns ' 123';
Syntax #2:
trim('&@' FROM '&@&@&@&127+1-@&@&@&@')"; returns "127+1-";
Syntax #3:
trim(leading 'ab%' from 'abc%%defg%%%%%'); returns 'c%%defg%%%%%';
trim(trailing 'bfg%' from 'abc%%defg%%%%%'); returns 'abc%%de';
trim(both 'abfg%' from 'abc%%defg%%%%%')"; returns "c%%de";

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/com/cloudera/impala/analysis/TrimExpr.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeExprsTest.java
9 files changed, 429 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>

[Impala-ASF-CR] IMPALA-889: Add support for ISO-SQL trim()

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has posted comments on this change.

Change subject: IMPALA-889: Add support for ISO-SQL trim()
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10: Form #1: Impala UDF calling
> What are the pros and cons of having two different forms?
Greetings Jim.
They are completely identical. They are just two options to access the same backend function. Users are free to choose any one.


Line 11:   Syntax: SELECT TRIM(where, characters, string_to_be_trimmed);
> "select" isn't part of the syntax
Done


PS1, Line 12:   where: enumerate values, available choices are: left/leading/right/trailing/both:
> Please wrap lines at the red vertical line. I believe it's 70 characters.
Done


PS1, Line 12: enumerate values
> What is the purpose of this phrase in the sentence?
These are avaible enumerated values for the "where" fields.


PS1, Line 16: option
> What is "option"? Do you mean "where"?
Yes, I mean "where".


PS1, Line 17: is
> are, not is
Done


Line 19:   Examples:
> Please don't give examples until you explain, in prose, what it means to tr
Done


PS1, Line 30:   Note: left and right are Impala keywords, they are not available in this syntax.
> Does this match the standard?
Yes, because the standard SQL uses the leading/trailing not left/right. I believe it is okay not to support left/right here.


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-889: Add support for ISO-SQL trim()

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-889: Add support for ISO-SQL trim()
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10: Form 1: Impala UDF calling
> Greetings Jim.
"Pros" means the reasons for something, while "cons" means the reasons against. I am asking you to explain why there should be two different syntaxes for the same function.


PS1, Line 12:   "where": an enumerate value denoting the trim direction.
> Done
I don't think you mean "an enumerate value" -- "enumerate" is, as far as I know, never an adjective. I don't think you need the word at all here.


PS1, Line 13: 
This line is still 3 characters over the red line in patch set 2. Please double-check on things ilke this before replying "Done", as it's not a good use of your time to have to go back and fix it twice and it's not a good use of your reviewer's time to ask you to fix it twice.


Line 19:     The order of such characters doesn't matter. Multiple occurrances of
> Done
I noted "don't just answer those particular questions", but you did exactly that. Please don't do that.

The prose should be able to explain what the function is. A person who doesn't know what the function is and doesn't understand the name of the function should be able to understand what it does without having to decipher the examples. You can see how this works with some of the functions here:

http://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_string_functions.html


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has uploaded a new patch set (#3).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Syntax #1: BTRIM(string where, string trim_char_set, string target);
  Purpose: Removes all instances of one or more characters from the
    specified direction(s) of a STRING value.
  "where": Case-insensitive trim direction. Valid options are:
    'left|leading|right|trailing|both'. 'left|leading' means trimming
    characters from the start; 'right|trailing' means trimming
    characters from the end; 'both' means trimming characters from
    both sides. If this argument is NULL or non-NULL but none of the
    valid options, the function returns the "target" argument
    untouched. Blank argument causes parsing error.
  "trim_char_set": Case-sensitive characters to be removed. This
    argument is regarded as a character set going to be removed. So
    the occurrence order of each character doesn't matter and
    duplicated instance of same character will be ignored. If this
    argument is NULL, the function returns the target string
    untouched or it removes all occurrences of characters in the
    "trim_char_set" string according to the "where" option. Blank
    argument causes parsing error.
  "target": Case-sensitive target string to trim. Blank argument
    causes parsing error.
  Return type: string

  Examples:
  btrim('left', 'a%', 'abc%%defg%%%%%'); returns 'bc%%defg%%%%%';
  btrim('right', 'fg%', 'abc%%defg%%%%%'); returns 'abc%%de';
  btrim('leading', 'ab%', 'abc%%defg%%%%%'); returns 'c%%defg%%%%%';
  btrim('trailing', 'bfg%', 'abc%%defg%%%%%'); returns 'abc%%de';
  btrim('both', 'abfg%', 'abc%%defg%%%%%'); returns 'c%%de';

Syntax #2: BTRIM(where string FROM string);
  Purpose: Identical as Form #1 except this form confirms the
    standard SQL syntax (Core SQL feature ID E021-09).
  "where": Case-insensitive trim direction, can be one of leading,
    trailing, or both. These options should be provided without
    single/double quotation marks since they are not string literals
    but identifiers. For left and right are Impala keywords, they are
    not available in this syntax.
  The second string argument corresponds to the "trim_char_set" in
  Syntax #1 and the third string argument corresponds to the "target"
  in Syntax #1. Same argument meaning and restriction from Syntax #1
  are applied.
  Return type: string

  Examples:
  btrim(leading 'ab%' from 'abc%%defg%%%%%'); returns 'c%%defg%%%%%';
  btrim(trailing 'bfg%' from 'abc%%defg%%%%%'); returns 'abc%%de';
  btrim(both 'abfg%' from 'abc%%defg%%%%%'); returns 'c%%de';

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeExprsTest.java
7 files changed, 198 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10:   Purpose: Removes all instances of one or more characters from the
> "Pros" means the reasons for something, while "cons" means the reasons agai
Greetings, Jim.
Please be kind to allow me to give some explanations from the start:
1. In order to support the syntax like: btrim(heading|trailing|both char_set from string), it is necessary to register an entry in the common/function-registry/impala_functions.py file. Actually, it is also mapped to the entry point of the backend logic.
2. Such entry will be encoded in the frontend built-in UDF category for parsing usage and it will be called when users input such query:  \u201cselect btrim(heading|trailing|both char_set from string)\u201d;
3. Once such entry is ready and compiled, the syntax of the form #1 will be automatically supported, no matter you like it or not. This is exactly the same case as this function \u201cextract(timestamp, string unit) || extract(unit FROM timestamp)\u201d shown in this link:
http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_datetime_functions.html
It is easy for me to conceal the description of the form 1. But I can\u2019t stop users from using it if they have some hacking skills and dig deeply into the impala_functions.py file. So in my opinion, it would be more friendly to point out the existence of such syntax - maybe there is someone whoever wants it. 
I am not sure whether my explanation can answer your question that \u201cwhy there should be two different syntaxes for the same function.\u201dIf you don\u2019t think my explanation is right. Please feel free to point out, thank you. :)


PS1, Line 12:   "where": Case-insensitive trim direction. Valid options are:
> I don't think you mean "an enumerate value" -- "enumerate" is, as far as I 
Done


PS1, Line 13: 
> This line is still 3 characters over the red line in patch set 2. Please do
Done


Line 19:   "trim_char_set": Case-sensitive characters to be removed. This
> I noted "don't just answer those particular questions", but you did exactly
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................


Patch Set 3:

(4 comments)

Missing parser tests as well as TrimExpr.java

http://gerrit.cloudera.org:8080/#/c/4474/3//COMMIT_MSG
Commit Message:

Line 9: Syntax #1: BTRIM(string where, string trim_char_set, string target);
We should definitely not invent a new function name and stick to trim(). The existing trim() is a special case of the more general ISO-SQL trim(), so we should fold the old functionality into the new more general expr.


Line 32:   btrim('left', 'a%', 'abc%%defg%%%%%'); returns 'bc%%defg%%%%%';
The 'where' and 'trim_char_set' clauses are optional, so the following are also legal uses of trim():

trim('  abcde  ');
trim(right from 'abcde  ');
trim('z' from 'zzzabczzz');
trim(left 'xz' from 'abcxz');


http://gerrit.cloudera.org:8080/#/c/4474/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 2418: string_expr ::=
Not clear what this rule adds. The copy+paste nature of it means an increased maintenance burden.


Line 2449:   // Below is a special case for BTRIM. Idents are used to avoid adding new keywords.
My understanding is that several of the trim() clauses are optional, e.g., the location and the characters to trim.

The expr seems complicated enough that we should make TRIM a keyword and add a separate production "trim_expr ::=" to recognize it (keep the optional clauses in mind).

Most other DBs seem to have TRIM as a keyword also, so we'd not be alone here.


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-889: Add support for ISO-SQL trim()

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-889: Add support for ISO-SQL trim()
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10: Form #1: Impala UDF calling
What are the pros and cons of having two different forms?


Line 11:   Syntax: SELECT TRIM(where, characters, string_to_be_trimmed);
"select" isn't part of the syntax


PS1, Line 12:   where: enumerate values, available choices are: left/leading/right/trailing/both:
Please wrap lines at the red vertical line. I believe it's 70 characters.


PS1, Line 12: enumerate values
What is the purpose of this phrase in the sentence?


PS1, Line 16: option
What is "option"? Do you mean "where"?


PS1, Line 17: is
are, not is


Line 19:   Examples:
Please don't give examples until you explain, in prose, what it means to trim characters - does the order of the characters in "characters" matter? Does their multiplicity?

And don't just answer those particular questions - try to describe the function precisely.


PS1, Line 30:   Note: left and right are Impala keywords, they are not available in this syntax.
Does this match the standard?


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has uploaded a new patch set (#6).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Purpose: Removes all instances of one or more characters from the
  specified direction(s) of a STRING value.
Syntax #1 TRIM(where FROM string_to_be_trimmed);
Syntax #2 TRIM(trim_char_set FROM string_to_be_trimmed);
Syntax #3 TRIM(where trim_char_set FROM string_to_be_trimmed);
Syntax #3 confirms the standard SQL syntax (Core SQL feature ID
  E021-09).

"where": Case-insensitive trim direction. Valid options are:
  leading|trailing|both. leading means trimming characters from the
  start; trailing means trimming characters from the end; both means
  trimming characters from both sides. These options should be
  provided without single/double quotation marks since they are not
  string literals but identifiers. NULL or empty argument implies
  "both" option. If an invalid option is specified, returns target
  untouched.
"trim_char_set": Case-sensitive characters to be removed. This
  argument is regarded as a character set going to be removed. So the
  occurrence order of each character doesn't matter and duplicated
  instance of same character will be ignored. NULL argument implies
  " "(space) by default. Empty argument return string_to_be_trimmed
  untouched.
"target": Case-sensitive target string to trim. This argument can be
  NULL. Blank argument causes parsing error.
Note: For Syntax #1, since no "characters" is specified, it trims
" "(space) by default. For Syntax #2, since no "where" is specified,
the option both is implied by default.
Return type: string

Examples:
Syntax #1:
trim(both 'abfg%' from 'abc%%defg%%%%%'); returns 'c%%de';
trim(leading FROM ' 123 '); returns '123 ';
trim(trailing FROM ' 123 '); returns ' 123';
Syntax #2:
trim('&@' FROM '&@&@&@&127+1-@&@&@&@')"; returns "127+1-";
Syntax #3:
trim(leading 'ab%' from 'abc%%defg%%%%%'); returns 'c%%defg%%%%%';
trim(trailing 'bfg%' from 'abc%%defg%%%%%'); returns 'abc%%de';
trim(both 'abfg%' from 'abc%%defg%%%%%')"; returns "c%%de";

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/TrimExpr.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
9 files changed, 427 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10:   Purpose: Removes all instances of one or more characters from the
> Greetings, Jim.
Please look at invisible_functions, line 672 of that file.


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has uploaded a new patch set (#7).

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................

IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Purpose: Removes all instances of one or more characters from the
  specified direction(s) of a STRING value.
Syntax #1 TRIM(where FROM string_to_be_trimmed);
Syntax #2 TRIM(trim_char_set FROM string_to_be_trimmed);
Syntax #3 TRIM(where trim_char_set FROM string_to_be_trimmed);
All syntaxes confirm the standard SQL syntax (Core SQL feature ID
  E021-09).

"where": Case-insensitive trim direction. Valid options are
  leading, trailing, and both. leading means trimming characters
  from the start; trailing means trimming characters from the end;
  both means trimming characters from both sides. These options
  should be provided without single/double quotation marks since they
  are not string literals but identifiers. NULL or empty
  argument("" or '') implies "both" option. If an invalid option is
  specified, TRIM returns target untouched. For Syntax #2, since no
  "where" is specified, the option both is implied by default.

"trim_char_set": Case-sensitive characters to be removed. This
  argument is regarded as a character set going to be removed. So the
  occurrence order of each character doesn't matter and duplicated
  instance of same character will be ignored. NULL argument implies
  " "(standard space) by default. Empty argument("" or '') makes TRIM
  return "string_to_be_trimmed" untouched. For Syntax #1, since no
  "trim_char_set" is specified, it trims " "(standard space) by
  default.

"target": Case-sensitive target string to trim. This argument can be
  NULL. Blank argument causes parsing error.

Return type: string

Examples:
Syntax #1:
trim(both 'abfg' from 'abcdefg'); Result: 'cde';
trim(leading FROM ' 123 '); Result: '123 ';
trim(trailing FROM ' 123 '); Result: ' 123';
Syntax #2:
trim('xyz' FROM 'zyxabczyx')"; Result: "abc";
Syntax #3:
trim(leading 'ab' from 'abcdefg'); Result: 'cdefg';
trim(trailing 'bfg%' from 'abcdefg'); Result: 'abcde';
trim(both 'abfg' from 'abcdefg')"; Result "cde";

Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M common/function-registry/impala_functions.py
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/TrimExpr.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
9 files changed, 406 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/4474/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................


Patch Set 6:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/4474/1//COMMIT_MSG
Commit Message:

Line 10:   specified direction(s) of a STRING value.
> Greetings, Jim.
Then change the name of the new function away from btrim to something else.


http://gerrit.cloudera.org:8080/#/c/4474/6//COMMIT_MSG
Commit Message:

PS6, Line 14: Syntax #3 confirms the standard SQL syntax (Core SQL feature ID
All of them are part of the standard, as is simply TRIM(string_to_be_trimmed).

http://savage.net.au/SQL/sql-92.bnf.html


PS6, Line 18: |
just use regular prose here: Valid options are "leading", "trailing", and "both"


PS6, Line 22: empty argument
What does "empty argument" mean in this case?


PS6, Line 22: NULL
How is a NULL passed? Is it literally just "trim(NULL from 'foo')"?


PS6, Line 23: returns
not just "returns", "TRIM returns". Here and below.


PS6, Line 25: "trim_char_set": Case-sensitive characters to be removed. This
Please separate paragraphs by a blank line.


PS6, Line 29: " "
Is the standard space or all whitespace, including tabs, newlines, etc.?


PS6, Line 32: Blank
How is a "blank" argument different than an "empty argument" above?


PS6, Line 33: ote: For Syntax #1, since no "characters" is specified, it trims
            : " "(space) by default. For Syntax #2, since no "where" is specified,
            : the option both is implied by default.
Please split this up and put it above in the text on each option.


PS6, Line 44: @&@&@&@
This is hard to read. Can you stick to alphanumeric characters in these examples, please? The smaller the input that illustrates the point, the clearer it is.

This applies to tests, as well.


http://gerrit.cloudera.org:8080/#/c/4474/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1977:   TestStringValue("trim(leading FROM '     &@&127+  &@   ')", "&@&127+  &@   ");
Please try to have each test case test one thing different from every other test case. What a case is testing should be clear by looking at it. The smaller the test case is, the better.

As an example, would this test case be better as "(trim leading FROM ' a b ')"?


http://gerrit.cloudera.org:8080/#/c/4474/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 788: static inline bitset<256>* TrimInternalPrepare(FunctionContext* ctx,
Please use unnamed namespaces; see https://google.github.io/styleguide/cppguide.html


PS6, Line 805: bitset
const


PS6, Line 806: &
Out parameters, when used, are not references. https://google.github.io/styleguide/cppguide.html#Reference_Arguments


PS6, Line 806: begin
prefer to return, not pass out parameters.


PS6, Line 807: test
Please use [], not test - we don't need the bounds checking.


PS6, Line 807: static_cast<int>
What type is str.ptr[begin]? What type does test take? Do you need this cast? Why or why not?


PS6, Line 813: int &
begin is not modified.


PS6, Line 822: bitset
const


PS6, Line 824: int32_t
Don't use int32_t and int interchangeably without a specific reason


PS6, Line 835: str
The original functions made a new string val, rather than returning this one. Do you know why?


Line 848:     option.ptr[i] = std::tolower(option.ptr[i]);
Though awkward, please insert the required casts here.


PS6, Line 851: TrimOption
This should be done at parsing/analysis time, for efficiency reasons


PS6, Line 855: INVALID
LEADING = 1, TRAILING = 2, BOTH = LEADING | TRAILING, INVALID = ~BOTH. The comparison on line 886 gets simplified.


PS6, Line 858: opt_
https://google.github.io/styleguide/cppguide.html#Variable_Names


PS6, Line 861: ( 
Please use clang-format to format your code.


PS6, Line 861: 7
try to avoid repeating literals, like 7. Maybe set static constexpr char LEADING[] = "leading", then check if option.len == sizeof(LEADING)-1? See how that looks.


PS6, Line 863: both
bad copy


PS6, Line 866: both
bad copy


http://gerrit.cloudera.org:8080/#/c/4474/1/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

PS1, Line 123: Base6
And while you're changing the name, no need to call this BTrim.


http://gerrit.cloudera.org:8080/#/c/4474/1/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

PS1, Line 478: ezza 
This is the string you should change. Change it to something else, like sql92trim, then make that invisible.


-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has abandoned this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>

Re: [Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by Jim Apple <jb...@apache.org>.
There are more than 40 other code reviews pending right now.

https://gerrit.cloudera.org/#/q/status:open+project:Impala-ASF

This is not an urgent bug fix, so it's lower priority, and it's
already on patch set 8, and it's not that close to being +1ed.

Please be patient.

On Sun, Oct 9, 2016 at 6:25 PM, Youwei Wang (Code Review)
<ge...@cloudera.org> wrote:
> Youwei Wang has posted comments on this change.
>
> Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
> ......................................................................
>
>
> Patch Set 8:
>
> Greetings everyone. The review status of this patch has paused for a few days. If possible, would you please move this a bit forward? Thank you. :)
>
> --
> To view, visit http://gerrit.cloudera.org:8080/4474
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
> Gerrit-PatchSet: 8
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Youwei Wang <yo...@intel.com>
> Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
> Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
> Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
> Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
> Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-889: Add support for an ISO-SQL compliant trim() function.

Posted by "Youwei Wang (Code Review)" <ge...@cloudera.org>.
Youwei Wang has posted comments on this change.

Change subject: IMPALA-889: Add support for an ISO-SQL compliant trim() function.
......................................................................


Patch Set 8:

Greetings everyone. The review status of this patch has paused for a few days. If possible, would you please move this a bit forward? Thank you. :)

-- 
To view, visit http://gerrit.cloudera.org:8080/4474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4753c608b0b00569bf8c5e95b132df6df358e602
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang <yo...@intel.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <yo...@intel.com>
Gerrit-HasComments: No