You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gabor Kaszab (Code Review)" <ge...@cloudera.org> on 2020/11/18 14:30:35 UTC

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16741


Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
M common/function-registry/impala_functions.py
A testdata/workloads/functional-query/queries/QueryTest/iceberg-transform-functions.test
M tests/query_test/test_iceberg.py
8 files changed, 137 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7880/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 18 Dec 2020 22:01:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 13:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/16741/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16741/12//COMMIT_MSG@9
PS12, Line 9: 
> nit: lines longer than 72 chars
Done


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc
File be/src/exprs/iceberg-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc@68
PS12, Line 68:       // If the input is negative and within int32 range while the width is bigger than
> Could you add a comment here?
Well, giving this a second (third) look this seems incorrect as width can't be negative. I rewrote this part of the code and added a comment.


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc@109
PS12, Line 109: IcebergFunctions::Trun
> This DCHECK seems unnecessary in the context of the above line.
Done


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc@111
PS12, Line 111: !CheckInputsAn
> Maybe you could wrap it to UNLIKELY
Done


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc@159
PS12, Line 159: urn IntVal::null();
> nit: maybe rename to BucketByteArray?
I remaned this but not exactly to the one you suggested. What do you think?


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc@166
PS12, Line 166: al& width) {
> seems unnecessary
Done


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-test.cc
File be/src/exprs/iceberg-functions-test.cc:

http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-test.cc@199
PS12, Line 199: ionTransform seems
              :   // problematic as it queries ARG_TYPE_
> nit: that gets the size as parameter?
Done


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-test.cc@294
PS12, Line 294:   ctx->impl()->Close();
> Could you add tests for int/long min/max values?
Done


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h
File be/src/util/bit-util.h:

http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@352
PS12, Line 352:   /// E.g. 12065530 = [0, -72, 26, -6]
> nit: could you add an example for a negative number, e.g. -129?  From the t
Sure, Done


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@357
PS12, Line 357:     T value = input;
> Is it necessary? How about adding the bytes as they are, and in the end jus
This is needed because negative numbers are stored in two's complement format and for this algorithm to work we have to apply some special treatment for negative numbers.

L365 and L366 are meant to handle early exits from the loop. With you other suggestion below there is no longer a need for L365.


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@359
PS12, Line 359: f(value); ++i
> 0xFF is one byte
Done


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@365
PS12, Line 365:       // is nothing left to process.
> nit: We could start the loop with
Thanks, your suggestion is cleaner then what I have now. Done


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@368
PS12, Line 368: 
> 'trailing' is a bit confusing because at the end it'll be in the front.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 13
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Dec 2020 12:53:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#14).

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

This patch implements Truncate and Bucket partition transforms in
Impala BE as built-in functions. The expectation is that these
functions give the same result as Iceberg's implementation of the same
functions. These built-in functions are invisible so users won't be
able to invoke them e.g. from impala-shell.

Truncate:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal.
  - Receives an input from the above types and a width.
  - Returns the same type as the input.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#truncate-transform-details
Bucket:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal,
    DateVal, TimestampVal.
  - Receives an input from the above types and the number of buckets as
    IntVal.
  - Returns IntVal.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#bucket-transform-details

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
A be/src/thirdparty/murmurhash/MurmurHash3.cpp
A be/src/thirdparty/murmurhash/MurmurHash3.h
A be/src/thirdparty/murmurhash/README.md
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
14 files changed, 1,438 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/14
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 16: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6792/


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 17 Dec 2020 18:42:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7858/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 13
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Dec 2020 13:15:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 15: Code-Review+2

(3 comments)

carry +2 from Zoltan

http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/exprs/iceberg-functions-ir.cc
File be/src/exprs/iceberg-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/exprs/iceberg-functions-ir.cc@75
PS14, Line 75:       return TruncatePartitionTransformDecimalImpl<int32_t>(input.val4, width.val);
> TruncatePartitionTransformNumericImpl handles negative overflow, i.e. when 
Here you won't overflow as the 9-digit decimals are actually stored as int64 even though they would fit in an int32. E.g. min value of int32 (and the values close to it but still within int32 range) are stored in int64 for decimals and would hit the below case and not this one.


http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/exprs/iceberg-functions-ir.cc@163
PS14, Line 163: data(
> nit: it doesn't really make a difference, but buffer.data() could be used i
Done


http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/util/bit-util.h
File be/src/util/bit-util.h:

http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/util/bit-util.h@363
PS14, Line 363: buffer.push_back(value_to_save);
> nit: string also has push_back member function, so this could be just
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 17 Dec 2020 13:10:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16741/6/be/src/thirdparty/murmurhash/MurmurHash3.h
File be/src/thirdparty/murmurhash/MurmurHash3.h:

http://gerrit.cloudera.org:8080/#/c/16741/6/be/src/thirdparty/murmurhash/MurmurHash3.h@21
PS6, Line 21: #else	// defined(_MSC_VER)
tab used for whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 03 Dec 2020 09:21:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#15).

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

This patch implements Truncate and Bucket partition transforms in
Impala BE as built-in functions. The expectation is that these
functions give the same result as Iceberg's implementation of the same
functions. These built-in functions are invisible so users won't be
able to invoke them e.g. from impala-shell.

Truncate:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal.
  - Receives an input from the above types and a width.
  - Returns the same type as the input.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#truncate-transform-details
Bucket:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal,
    DateVal, TimestampVal.
  - Receives an input from the above types and the number of buckets as
    IntVal.
  - Returns IntVal.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#bucket-transform-details

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
A be/src/thirdparty/murmurhash/MurmurHash3.cpp
A be/src/thirdparty/murmurhash/MurmurHash3.h
A be/src/thirdparty/murmurhash/README.md
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
14 files changed, 1,438 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/15
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7866/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Dec 2020 17:03:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

This patch implements Truncate and Bucket partition transforms in
Impala BE as built-in functions. The expectation is that these
functions give the same result as Iceberg's implementation of the same
functions. These built-in functions are invisible so users won't be
able to invoke them e.g. from impala-shell.

Truncate:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal.
  - Receives an input from the above types and a width.
  - Returns the same type as the input.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#truncate-transform-details
Bucket:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal,
    DateVal, TimestampVal.
  - Receives an input from the above types and the number of buckets as
    IntVal.
  - Returns IntVal.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#bucket-transform-details

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Reviewed-on: http://gerrit.cloudera.org:8080/16741
Reviewed-by: Gabor Kaszab <ga...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
A be/src/thirdparty/murmurhash/MurmurHash3.cpp
A be/src/thirdparty/murmurhash/MurmurHash3.h
A be/src/thirdparty/murmurhash/README.md
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
14 files changed, 1,438 insertions(+), 2 deletions(-)

Approvals:
  Gabor Kaszab: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 18
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#5).

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
M common/function-registry/impala_functions.py
7 files changed, 464 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/exprs/iceberg-functions-ir.cc
File be/src/exprs/iceberg-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/exprs/iceberg-functions-ir.cc@144
PS13, Line 144: std::vector<char> buffer;
Maybe we can avoid dynamic memory allocations if we just used std::string and hope that small string optimization kicks in.



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 13
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Dec 2020 10:56:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7850/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 Dec 2020 13:55:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6794/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 18 Dec 2020 21:40:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 12:

(13 comments)

Had a couple of comments, but the change looks good to me overall.

http://gerrit.cloudera.org:8080/#/c/16741/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16741/12//COMMIT_MSG@9
PS12, Line 9:  BE as
nit: lines longer than 72 chars


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc
File be/src/exprs/iceberg-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc@68
PS12, Line 68:       if (UNLIKELY(width.val < std::numeric_limits<int32_t>::min())) {
Could you add a comment here?


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc@109
PS12, Line 109: DCHECK(input.val < 0);
This DCHECK seems unnecessary in the context of the above line.


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc@111
PS12, Line 111: result.val > 0
Maybe you could wrap it to UNLIKELY


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc@159
PS12, Line 159: BucketPartitionTransformDecimalImpl
nit: maybe rename to BucketByteArray?


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-ir.cc@166
PS12, Line 166: * sizeof(char)
seems unnecessary


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-test.cc
File be/src/exprs/iceberg-functions-test.cc:

http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-test.cc@199
PS12, Line 199: that skips checking
              :   // function context for ARG_TYPE_SIZE.
nit: that gets the size as parameter?


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/exprs/iceberg-functions-test.cc@294
PS12, Line 294: void IcebergBucketPartitionTransformTests::TestIntegerNumbers() {
Could you add tests for int/long min/max values?


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h
File be/src/util/bit-util.h:

http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@352
PS12, Line 352:   /// E.g. 12065530 = [0, -72, 26, -6]
nit: could you add an example for a negative number, e.g. -129?  From the tests I can see it's {-1, 127} but it'd be a useful example IMO.


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@357
PS12, Line 357:     if (input < 0) value = ~value;
Is it necessary? How about adding the bytes as they are, and in the end just remove the redundant zeroes and minus ones from the end of the array before reverse() is called?


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@359
PS12, Line 359:  2 byte range
0xFF is one byte


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@365
PS12, Line 365:       value &= ~((T)0XFF << (8 * i));
nit: We could start the loop with

 char value_to_save = value & 0XFF;
 value >>= BYTE_SIZE;


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@368
PS12, Line 368: add a trailing zero.
'trailing' is a bit confusing because at the end it'll be in the front.



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 Dec 2020 18:10:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7763/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 02 Dec 2020 08:56:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 14: Code-Review+2

(3 comments)

I've left some minor comments, but the change looks great overall!

http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/exprs/iceberg-functions-ir.cc
File be/src/exprs/iceberg-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/exprs/iceberg-functions-ir.cc@75
PS14, Line 75:       return TruncatePartitionTransformDecimalImpl<int32_t>(input.val4, width.val);
TruncatePartitionTransformNumericImpl handles negative overflow, i.e. when input.val < 0 && result.val >= 0. I think we should do it here is as well.


http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/exprs/iceberg-functions-ir.cc@163
PS14, Line 163: c_str
nit: it doesn't really make a difference, but buffer.data() could be used instead which maybe expresses the intent better


http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/util/bit-util.h
File be/src/util/bit-util.h:

http://gerrit.cloudera.org:8080/#/c/16741/14/be/src/util/bit-util.h@363
PS14, Line 363: buffer.append(&value_to_save, 1);
nit: string also has push_back member function, so this could be just

 buffer.push_back(value_to_save);



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Dec 2020 17:34:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 16:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6793/


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 18 Dec 2020 17:07:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#12).

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

This patch implements Truncate and Bucket partition transforms in Impala BE as
built-in functions. The expectation is that these functions give the same
result as Iceberg's implementation of the same functions. These built-in
functions are invisible so users won't be able to invoke them e.g. from
impala-shell.

Truncate:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal.
  - Receives an input from the above types and a width.
  - Returns the same type as the input.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#truncate-transform-details
Bucket:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal, DateVal,
    TimestampVal.
  - Receives an input from the above types and the number of buckets as IntVal.
  - Returns IntVal.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#bucket-transform-details

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
A be/src/thirdparty/murmurhash/MurmurHash3.cpp
A be/src/thirdparty/murmurhash/MurmurHash3.h
A be/src/thirdparty/murmurhash/README.md
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
14 files changed, 1,411 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/12
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7760/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 01 Dec 2020 15:44:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 17: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 19 Dec 2020 03:09:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/CMakeLists.txt
File be/src/exprs/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/CMakeLists.txt@90
PS3, Line 90: A
> nit: space
Done


http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/iceberg-functions-ir.cc
File be/src/exprs/iceberg-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/iceberg-functions-ir.cc@26
PS3, Line 26: 
> nit: too much spaces
Done


http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/iceberg-functions-ir.cc@77
PS3, Line 77: }
> maybe we could have a fast-path when input.val is nonnegative, i.e. in that
Sure. I added this and the same for decimals.


http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/iceberg-functions-ir.cc@84
PS3, Line 84: n StringVal::null();
> Is this correct formula if decimal_val is negative?
Changed it meanwhile in PS4



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 02 Dec 2020 08:35:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7876/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 15
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 17 Dec 2020 13:32:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 17: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 18 Dec 2020 21:40:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16741/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/16741/1/tests/query_test/test_iceberg.py@61
PS1, Line 61: b
flake8: E501 line too long (95 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Nov 2020 14:31:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#13).

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

This patch implements Truncate and Bucket partition transforms in
Impala BE as built-in functions. The expectation is that these
functions give the same result as Iceberg's implementation of the same
functions. These built-in functions are invisible so users won't be
able to invoke them e.g. from impala-shell.

Truncate:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal.
  - Receives an input from the above types and a width.
  - Returns the same type as the input.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#truncate-transform-details
Bucket:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal,
    DateVal, TimestampVal.
  - Receives an input from the above types and the number of buckets as
    IntVal.
  - Returns IntVal.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#bucket-transform-details

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
A be/src/thirdparty/murmurhash/MurmurHash3.cpp
A be/src/thirdparty/murmurhash/MurmurHash3.h
A be/src/thirdparty/murmurhash/README.md
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
14 files changed, 1,456 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/13
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 13
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 16: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 17 Dec 2020 13:11:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#11).

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

This patch implements Truncate and Bucket partition transforms in Impala BE as
built-in functions. The expectation is that these functions give the same
result as Iceberg's implementation of the same functions. These built-in
functions are invisible so users won't be able to invoke them e.g. from
impala-shell.

Truncate:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal.
  - Receives an input from the above types and a width.
  - Returns the same type as the input.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#truncate-transform-details
Bucket:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal, DateVal,
    TimestampVal.
  - Receives an input from the above types and the number of buckets as IntVal.
  - Returns IntVal.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#bucket-transform-details

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
A be/src/thirdparty/murmurhash/MurmurHash3.cpp
A be/src/thirdparty/murmurhash/MurmurHash3.h
A be/src/thirdparty/murmurhash/README.md
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
14 files changed, 1,414 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/11
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6793/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 18 Dec 2020 11:36:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/CMakeLists.txt
File be/src/exprs/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/CMakeLists.txt@90
PS3, Line 90:  
nit: space


http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/iceberg-functions-ir.cc
File be/src/exprs/iceberg-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/iceberg-functions-ir.cc@26
PS3, Line 26:   
nit: too much spaces


http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/iceberg-functions-ir.cc@77
PS3, Line 77: return input.val - (((input.val % width.val) + width.val) % width.val);
maybe we could have a fast-path when input.val is nonnegative, i.e. in that case the expression could be simply:

 input.val - (input.val % width.val)


http://gerrit.cloudera.org:8080/#/c/16741/3/be/src/exprs/iceberg-functions-ir.cc@84
PS3, Line 84: decimal_val - (decimal_val % width);
Is this correct formula if decimal_val is negative?



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Nov 2020 11:59:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 17:

carry +2 from Zoltan


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 18 Dec 2020 21:40:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#3).

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
M common/function-registry/impala_functions.py
7 files changed, 360 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/exprs/iceberg-functions-ir.cc
File be/src/exprs/iceberg-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/exprs/iceberg-functions-ir.cc@144
PS13, Line 144: std::string buffer;
> Maybe we can avoid dynamic memory allocations if we just used std::string a
Done


http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/util/bit-util.h
File be/src/util/bit-util.h:

http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/util/bit-util.h@355
PS13, Line 355: . -1
> Maybe we could just return the vector.
I wanted to avoid copy-ing the vector, but sure, I can return it as a return value as well. Done.


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h
File be/src/util/bit-util.h:

http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@357
PS12, Line 357:   static std::string IntToByteBuff
> I think we don't need the negation at L357 and the multiplication with -1 a
Nice solution! I'll use this one.



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 14
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Dec 2020 16:46:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7839/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Dec 2020 10:46:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#8).

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

This patch implements Truncate and Bucket partition transforms in Impala BE as
built-in functions. The expectation is that these functions give the same
result as Iceberg's implementation of the same functions. These built-in
functions are invisible so users won't be able to invoke them e.g. from
impala-shell.

Truncate:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal.
  - Receives an input from the above types and a width.
  - Returns the same type as the input.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#truncate-transform-details
Bucket:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal, DateVal,
    TimestampVal.
  - Receives an input from the above types and the number of buckets as IntVal.
  - Returns IntVal.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#bucket-transform-details

TODO gaborkaszab: This patch is WiP because Decimal support of Bucket transform
is not part of this patch and to be implemented.

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
A be/src/thirdparty/murmurhash/MurmurHash3.cpp
A be/src/thirdparty/murmurhash/MurmurHash3.h
A be/src/thirdparty/murmurhash/README.md
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
12 files changed, 1,217 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/8
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#4).

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
M common/function-registry/impala_functions.py
7 files changed, 462 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7672/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Nov 2020 14:52:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16741/8/be/src/thirdparty/murmurhash/MurmurHash3.h
File be/src/thirdparty/murmurhash/MurmurHash3.h:

http://gerrit.cloudera.org:8080/#/c/16741/8/be/src/thirdparty/murmurhash/MurmurHash3.h@24
PS8, Line 24: #else  // defined(_MSC_VER)
> tab used for whitespace
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Dec 2020 10:39:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16741/7/be/src/exprs/iceberg-functions-ir.cc
File be/src/exprs/iceberg-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16741/7/be/src/exprs/iceberg-functions-ir.cc@59
PS7, Line 59:   return TruncateDecimal(ctx, input, width, decimal_size);
            : }
            : 
> Could you add a comment what happens here? Shouldn't we use something like 
Actually this check is not needed as it is not a valid scenario to overflow here. The reason is that when an input is close to int32_t::min then the decimal is already represented on 8bytes instead of 4.



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Dec 2020 10:15:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#17).

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

This patch implements Truncate and Bucket partition transforms in
Impala BE as built-in functions. The expectation is that these
functions give the same result as Iceberg's implementation of the same
functions. These built-in functions are invisible so users won't be
able to invoke them e.g. from impala-shell.

Truncate:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal.
  - Receives an input from the above types and a width.
  - Returns the same type as the input.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#truncate-transform-details
Bucket:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal,
    DateVal, TimestampVal.
  - Receives an input from the above types and the number of buckets as
    IntVal.
  - Returns IntVal.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#bucket-transform-details

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
A be/src/thirdparty/murmurhash/MurmurHash3.cpp
A be/src/thirdparty/murmurhash/MurmurHash3.h
A be/src/thirdparty/murmurhash/README.md
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
14 files changed, 1,438 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/17
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 13:

(2 comments)

I had some comments about simplifying IntToByteArray, other than that the change lgtm.

http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/util/bit-util.h
File be/src/util/bit-util.h:

http://gerrit.cloudera.org:8080/#/c/16741/13/be/src/util/bit-util.h@355
PS13, Line 355: void
Maybe we could just return the vector.


http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h
File be/src/util/bit-util.h:

http://gerrit.cloudera.org:8080/#/c/16741/12/be/src/util/bit-util.h@357
PS12, Line 357:     T value = input;
> This is needed because negative numbers are stored in two's complement form
I think we don't need the negation at L357 and the multiplication with -1 at L361 because the right shift introduces ones on the left for negative numbers.

I tested the following algorithm and it worked well:

  template<typename T>
  vector<char> IntToByteArray(T input) {
    vector<char> ret;
    T value = input;
    for (int i = 0; i < sizeof(value); ++i) {
      // Applies a mask for a byte range on the input.
      char value_to_save = value & 0XFF;
      ret.push_back(value_to_save);
      value >>= 8;
      if (value == 0 && value_to_save >= 0) break;
      if (value == -1 && value_to_save < 0) break;
    }
    std::reverse(ret.begin(), ret.end());
    return ret;
  }



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 13
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Dec 2020 10:50:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#9).

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

This patch implements Truncate and Bucket partition transforms in Impala BE as
built-in functions. The expectation is that these functions give the same
result as Iceberg's implementation of the same functions. These built-in
functions are invisible so users won't be able to invoke them e.g. from
impala-shell.

Truncate:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal.
  - Receives an input from the above types and a width.
  - Returns the same type as the input.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#truncate-transform-details
Bucket:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal, DateVal,
    TimestampVal.
  - Receives an input from the above types and the number of buckets as IntVal.
  - Returns IntVal.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#bucket-transform-details

TODO gaborkaszab: This patch is WiP because Decimal support of Bucket transform
is not part of this patch and to be implemented.

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
A be/src/thirdparty/murmurhash/MurmurHash3.cpp
A be/src/thirdparty/murmurhash/MurmurHash3.h
A be/src/thirdparty/murmurhash/README.md
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
12 files changed, 1,217 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/9
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#6).

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
A be/src/thirdparty/murmurhash/MurmurHash3.cpp
A be/src/thirdparty/murmurhash/MurmurHash3.h
A be/src/thirdparty/murmurhash/README.md
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
12 files changed, 860 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/6
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7740/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Nov 2020 16:32:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7768/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 03 Dec 2020 09:43:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16741/8/be/src/thirdparty/murmurhash/MurmurHash3.h
File be/src/thirdparty/murmurhash/MurmurHash3.h:

http://gerrit.cloudera.org:8080/#/c/16741/8/be/src/thirdparty/murmurhash/MurmurHash3.h@24
PS8, Line 24: #else	// defined(_MSC_VER)
tab used for whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Dec 2020 10:10:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Hello Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16741

to look at the new patch set (#10).

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................

WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

This patch implements Truncate and Bucket partition transforms in Impala BE as
built-in functions. The expectation is that these functions give the same
result as Iceberg's implementation of the same functions. These built-in
functions are invisible so users won't be able to invoke them e.g. from
impala-shell.

Truncate:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal.
  - Receives an input from the above types and a width.
  - Returns the same type as the input.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#truncate-transform-details
Bucket:
  - Supported types are IntVal, BigIntVal, StringVal, DecimalVal, DateVal,
    TimestampVal.
  - Receives an input from the above types and the number of buckets as IntVal.
  - Returns IntVal.
  - Expected behaviour is explained here:
    https://iceberg.apache.org/spec/#bucket-transform-details

Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
A be/src/exprs/iceberg-functions-ir.cc
A be/src/exprs/iceberg-functions-test.cc
A be/src/exprs/iceberg-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
A be/src/thirdparty/murmurhash/MurmurHash3.cpp
A be/src/thirdparty/murmurhash/MurmurHash3.h
A be/src/thirdparty/murmurhash/README.md
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
14 files changed, 1,414 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/16741/10
-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16741/7/be/src/exprs/iceberg-functions-ir.cc
File be/src/exprs/iceberg-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16741/7/be/src/exprs/iceberg-functions-ir.cc@59
PS7, Line 59:       if (input.val4 < 0 && result.val4 > 0) {
            :         return TruncatePartitionTransformDecimalImpl<int64_t>(input.val4, width.val);
            :       }
Could you add a comment what happens here? Shouldn't we use something like RETURN_IF_OVERFLOW in decimal-operators-ir.cc?

impala_udf::DecimalVal is able to hold decimals with any size, but impala::DecimalVal might only have 4 bytes of storage, this might be problematic in some cases.



-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 04 Dec 2020 11:48:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6792/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 16
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 17 Dec 2020 13:12:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7838/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Dec 2020 10:32:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 1:

Thanks for working on this. The change looks great, though I became a bit unsure about whether we want to make these functions visible to the user. However, it's definitely useful during development. Maybe write C++ unit tests instead of e2e tests, and later we can decide the visibility of these functions.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Nov 2020 14:46:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16741 )

Change subject: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7849/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 Dec 2020 13:23:06 +0000
Gerrit-HasComments: No