You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Abhishek Rawat (Code Review)" <ge...@cloudera.org> on 2019/06/13 23:56:58 UTC

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Abhishek Rawat has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13645


Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................

IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
       - Analysis finished: 14.533ms (13.881ms)
       - Analysis finished: 36.736ms (35.478ms)
       - Analysis finished: 112.932ms (108.913ms)
       - Analysis finished: 357.739ms (352.843ms)
       - Analysis finished: 1s242ms (1s234ms)
       - Analysis finished: 5s832ms (5s815ms)
       - Analysis finished: 28s994ms (28s960ms)
       - Analysis finished: 2m28s (2m28s)

Test:
       - Analysis finished: 2.107ms (1.380ms)
       - Analysis finished: 6.176ms (4.887ms)
       - Analysis finished: 20.043ms (17.569ms)
       - Analysis finished: 58.013ms (53.620ms)
       - Analysis finished: 241.455ms (232.775ms)
       - Analysis finished: 1s084ms (1s067ms)
       - Analysis finished: 5s718ms (5s674ms)
       - Analysis finished: 45s177ms (45s107ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/slot-ref.cc
M be/src/runtime/runtime-state.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
A testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
M tests/query_test/test_codegen.py
14 files changed, 176 insertions(+), 39 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13645/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 11:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13645/10/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

http://gerrit.cloudera.org:8080/#/c/13645/10/be/src/exec/union-node.cc@44
PS10, Line 44:   const int64_t num_nonconst_scalar_expr_to_be_codegened =
> Nit: Should we call this "num_nonconst_scalar_expr_to_be_codegened"? It mig
Renamed variable as suggested


http://gerrit.cloudera.org:8080/#/c/13645/10/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/13645/10/be/src/exprs/scalar-expr.h@259
PS10, Line 259: bool is_codegen_disabled
> Nit: One option is to give is_codegen_disabled a default value of false. Th
Good suggestion. Set default value as false.


http://gerrit.cloudera.org:8080/#/c/13645/10/tests/query_test/test_codegen.py
File tests/query_test/test_codegen.py:

http://gerrit.cloudera.org:8080/#/c/13645/10/tests/query_test/test_codegen.py@96
PS10, Line 96:   def test_const_scalar_expr_in_union(self, vector, 
> Nit: Since this creates tables, it would be good to use a unique_database s
done



-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 11
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Dec 2021 22:54:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7707/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 11
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Dec 2021 23:35:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#11) to the change originally created by Abhishek Rawat. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................

IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Passed exhaustive unit-tests.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
       - Analysis finished: 20.137ms (19.215ms)
       - Analysis finished: 46.275ms (44.597ms)
       - Analysis finished: 119.642ms (116.663ms)
       - Analysis finished: 361.195ms (355.856ms)
       - Analysis finished: 1s277ms (1s266ms)
       - Analysis finished: 5s664ms (5s640ms)
       - Analysis finished: 29s689ms (29s646ms)
       - Analysis finished: 2m (2m)

Test:
       - Analysis finished: 1.868ms (986.520us)
       - Analysis finished: 3.195ms (1.856ms)
       - Analysis finished: 7.332ms (3.484ms)
       - Analysis finished: 13.896ms (8.071ms)
       - Analysis finished: 31.015ms (18.963ms)
       - Analysis finished: 60.157ms (38.125ms)
       - Analysis finished: 113.694ms (67.642ms)
       - Analysis finished: 253.044ms (163.180ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/runtime/fragment-state.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/values.test
A testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
M tests/query_test/test_codegen.py
13 files changed, 173 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13645/11
-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 11
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13645/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13645/1//COMMIT_MSG@38
PS1, Line 38: - Ran manual tests to validate that the non-linear regression in VALUES
> Will add new frontend tests
It seems we couldn't get a plan where a Union node with constant expressions is in a subplan since the planner is smart enough to not do that or such plans are not possible.



-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 9
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Dec 2021 19:03:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 11: Code-Review+2

This looks good to me. Thanks for getting this done!


-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 11
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Dec 2021 23:22:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#10) to the change originally created by Abhishek Rawat. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................

IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Passed exhaustive unit-tests.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
       - Analysis finished: 20.137ms (19.215ms)
       - Analysis finished: 46.275ms (44.597ms)
       - Analysis finished: 119.642ms (116.663ms)
       - Analysis finished: 361.195ms (355.856ms)
       - Analysis finished: 1s277ms (1s266ms)
       - Analysis finished: 5s664ms (5s640ms)
       - Analysis finished: 29s689ms (29s646ms)
       - Analysis finished: 2m (2m)

Test:
       - Analysis finished: 1.868ms (986.520us)
       - Analysis finished: 3.195ms (1.856ms)
       - Analysis finished: 7.332ms (3.484ms)
       - Analysis finished: 13.896ms (8.071ms)
       - Analysis finished: 31.015ms (18.963ms)
       - Analysis finished: 60.157ms (38.125ms)
       - Analysis finished: 113.694ms (67.642ms)
       - Analysis finished: 253.044ms (163.180ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/slot-ref.cc
M be/src/runtime/fragment-state.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/values.test
A testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
M tests/query_test/test_codegen.py
16 files changed, 189 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13645/10
-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 10
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 10: Code-Review+1

(3 comments)

This makes sense to me. I only have a couple nits, but otherwise I'm good to go to +2.

http://gerrit.cloudera.org:8080/#/c/13645/10/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

http://gerrit.cloudera.org:8080/#/c/13645/10/be/src/exec/union-node.cc@44
PS10, Line 44:   const int64_t num_scalar_expr_to_be_codegened = state->NumScalarExprNeedsCodegen();
Nit: Should we call this "num_nonconst_scalar_expr_to_be_codegened"? It might make the calculation below clearer. Alternatively, we could just put a comment in saying that NumScalarExprNeedsCodegen() changes as we add the const exprs.


http://gerrit.cloudera.org:8080/#/c/13645/10/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/13645/10/be/src/exprs/scalar-expr.h@259
PS10, Line 259: bool is_codegen_disabled
Nit: One option is to give is_codegen_disabled a default value of false. This would let you avoid some of the changes in other files.

ScalarExpr(const ColumnType& type, bool is_constant, bool is_codegen_disabled = false);

I don't have a strong preference about this.


http://gerrit.cloudera.org:8080/#/c/13645/10/tests/query_test/test_codegen.py
File tests/query_test/test_codegen.py:

http://gerrit.cloudera.org:8080/#/c/13645/10/tests/query_test/test_codegen.py@96
PS10, Line 96:   def test_const_scalar_expr_in_union(self, vector):
Nit: Since this creates tables, it would be good to use a unique_database so the tables automatically get cleaned up.



-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 10
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Dec 2021 20:23:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9856/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 Dec 2021 01:03:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 11: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 11
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Dec 2021 05:51:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#7) to the change originally created by Abhishek Rawat. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................

WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.
- TODO: add frontend tests for the expression rewrite.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
       - Analysis finished: 14.533ms (13.881ms)
       - Analysis finished: 36.736ms (35.478ms)
       - Analysis finished: 112.932ms (108.913ms)
       - Analysis finished: 357.739ms (352.843ms)
       - Analysis finished: 1s242ms (1s234ms)
       - Analysis finished: 5s832ms (5s815ms)
       - Analysis finished: 28s994ms (28s960ms)
       - Analysis finished: 2m28s (2m28s)

Test:
       - Analysis finished: 2.107ms (1.380ms)
       - Analysis finished: 6.176ms (4.887ms)
       - Analysis finished: 20.043ms (17.569ms)
       - Analysis finished: 58.013ms (53.620ms)
       - Analysis finished: 241.455ms (232.775ms)
       - Analysis finished: 1s084ms (1s067ms)
       - Analysis finished: 5s718ms (5s674ms)
       - Analysis finished: 45s177ms (45s107ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/fragment-state.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
A testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
M tests/query_test/test_codegen.py
15 files changed, 173 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13645/7
-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 7
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 1:

(7 comments)

The approach makes sense.

http://gerrit.cloudera.org:8080/#/c/13645/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13645/1//COMMIT_MSG@38
PS1, Line 38:   Results below.
I think we should have some frontend tests for the expression rewrite part of this. I don't think anything too elaborate, but just queries with values clauses in a couple of positions (e.g. a simple INSERT, and maybe a VALUE clause as a subquery in a select), with expressions that would have otherwise been rewritten.

I think that would fit in ./fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java


http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exec/union-node.h
File be/src/exec/union-node.h:

http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exec/union-node.h@70
PS1, Line 70:   /// Number of const scalar expressions which will be codegened.
Can you mention that this is only used for observability? Or something along those lines.

It gets confusing otherwise with all the state variables - it's hard to understand what actually influences the control flow, etc.


http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exprs/scalar-expr.h@402
PS1, Line 402:   bool is_codegen_disabled_;
Could be const since it always initialised in the constructor.


http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/runtime/runtime-state.h@155
PS1, Line 155: uint32_t
Please use an int64_t - the style guide says to stick to signed integers (with a few exceptions) https://google.github.io/styleguide/cppguide.html#Integer_Types


http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift@153
PS1, Line 153:   // If codegen is disabled for this Expr
I was thinking about whether it would be better to invert the meaning of this field, i.e. make it is_codegen_enabled, since it's often easier to follow the code - you don't have to parse double negatives like !is_codegen_disabled.

I think this is OK, particularly since "codegen disabled" is used pretty pervasively in Impala.


http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift@154
PS1, Line 154:   5: required bool is_codegen_disabled
I would avoid renumbering the fields. you could just make this 22.

It doesn't *really* matter for this struct since it's only used internally between daemons running the same version, but renumbering thrift fields does break binary compatibility, with weird errors. So it's a bad habit to get into, I think.


http://gerrit.cloudera.org:8080/#/c/13645/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/13645/1/fe/src/main/java/org/apache/impala/analysis/Expr.java@400
PS1, Line 400:     for (Expr child: children_) {child.disableCodegen();}
Non-standard formatting. If we use braces, we generally just put the statement on multiple lines.



-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Jun 2019 16:05:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift@154
PS1, Line 154: 
> Tried to move the new field to the last spot, but did not work. Backend c++
Changed code to set the new field with setter function.



-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 7
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 Dec 2021 01:10:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9857/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 7
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 Dec 2021 01:06:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3623/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Jun 2019 00:51:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13645/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13645/1//COMMIT_MSG@38
PS1, Line 38:   Results below.
> I think we should have some frontend tests for the expression rewrite part 
Will add new frontend tests


http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exec/union-node.h
File be/src/exec/union-node.h:

http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exec/union-node.h@70
PS1, Line 70:   bool is_codegen_status_added_ = false;
> Can you mention that this is only used for observability? Or something alon
Done


http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/exprs/scalar-expr.h@402
PS1, Line 402:   /// it the ScalarExprEvaluator and TupleRow. These are cross-compiled and used by
> Could be const since it always initialised in the constructor.
Done


http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/13645/1/be/src/runtime/runtime-state.h@155
PS1, Line 155: 
> Please use an int64_t - the style guide says to stick to signed integers (w
Done


http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift@153
PS1, Line 153:   3: required i32 num_children
> I was thinking about whether it would be better to invert the meaning of th
ok


http://gerrit.cloudera.org:8080/#/c/13645/1/common/thrift/Exprs.thrift@154
PS1, Line 154: 
> I would avoid renumbering the fields. you could just make this 22.
Tried to move the new field to the last spot, but did not work. Backend c++ code always get the value as false even the frontend set it as true.


http://gerrit.cloudera.org:8080/#/c/13645/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/13645/1/fe/src/main/java/org/apache/impala/analysis/Expr.java@400
PS1, Line 400:   private boolean isConstant_;
> Non-standard formatting. If we use braces, we generally just put the statem
done



-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Nov 2021 21:50:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13645/6/tests/query_test/test_codegen.py
File tests/query_test/test_codegen.py:

http://gerrit.cloudera.org:8080/#/c/13645/6/tests/query_test/test_codegen.py@101
PS6, Line 101: 
flake8: W391 blank line at end of file



-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 Dec 2021 00:40:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9874/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 9
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 04 Dec 2021 02:01:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9883/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 10
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Dec 2021 19:27:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#5) to the change originally created by Abhishek Rawat. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................

WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.
- TODO: add frontend tests for the expression rewrite.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
       - Analysis finished: 14.533ms (13.881ms)
       - Analysis finished: 36.736ms (35.478ms)
       - Analysis finished: 112.932ms (108.913ms)
       - Analysis finished: 357.739ms (352.843ms)
       - Analysis finished: 1s242ms (1s234ms)
       - Analysis finished: 5s832ms (5s815ms)
       - Analysis finished: 28s994ms (28s960ms)
       - Analysis finished: 2m28s (2m28s)

Test:
       - Analysis finished: 2.107ms (1.380ms)
       - Analysis finished: 6.176ms (4.887ms)
       - Analysis finished: 20.043ms (17.569ms)
       - Analysis finished: 58.013ms (53.620ms)
       - Analysis finished: 241.455ms (232.775ms)
       - Analysis finished: 1s084ms (1s067ms)
       - Analysis finished: 5s718ms (5s674ms)
       - Analysis finished: 45s177ms (45s107ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/fragment-state.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
A testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
M tests/query_test/test_codegen.py
15 files changed, 192 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13645/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13645/6/tests/query_test/test_codegen.py
File tests/query_test/test_codegen.py:

http://gerrit.cloudera.org:8080/#/c/13645/6/tests/query_test/test_codegen.py@101
PS6, Line 101: 
> flake8: W391 blank line at end of file
fixed



-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 Dec 2021 00:43:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has removed Michael Ho from this change.  ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Removed reviewer Michael Ho.
-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has removed Tim Armstrong from this change.  ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Removed reviewer Tim Armstrong.
-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9936/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 11
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Dec 2021 23:14:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................

IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Passed exhaustive unit-tests.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
       - Analysis finished: 20.137ms (19.215ms)
       - Analysis finished: 46.275ms (44.597ms)
       - Analysis finished: 119.642ms (116.663ms)
       - Analysis finished: 361.195ms (355.856ms)
       - Analysis finished: 1s277ms (1s266ms)
       - Analysis finished: 5s664ms (5s640ms)
       - Analysis finished: 29s689ms (29s646ms)
       - Analysis finished: 2m (2m)

Test:
       - Analysis finished: 1.868ms (986.520us)
       - Analysis finished: 3.195ms (1.856ms)
       - Analysis finished: 7.332ms (3.484ms)
       - Analysis finished: 13.896ms (8.071ms)
       - Analysis finished: 31.015ms (18.963ms)
       - Analysis finished: 60.157ms (38.125ms)
       - Analysis finished: 113.694ms (67.642ms)
       - Analysis finished: 253.044ms (163.180ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Reviewed-on: http://gerrit.cloudera.org:8080/13645
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/runtime/fragment-state.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/values.test
A testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
M tests/query_test/test_codegen.py
13 files changed, 173 insertions(+), 25 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 12
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#6) to the change originally created by Abhishek Rawat. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................

WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.
- TODO: add frontend tests for the expression rewrite.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
       - Analysis finished: 14.533ms (13.881ms)
       - Analysis finished: 36.736ms (35.478ms)
       - Analysis finished: 112.932ms (108.913ms)
       - Analysis finished: 357.739ms (352.843ms)
       - Analysis finished: 1s242ms (1s234ms)
       - Analysis finished: 5s832ms (5s815ms)
       - Analysis finished: 28s994ms (28s960ms)
       - Analysis finished: 2m28s (2m28s)

Test:
       - Analysis finished: 2.107ms (1.380ms)
       - Analysis finished: 6.176ms (4.887ms)
       - Analysis finished: 20.043ms (17.569ms)
       - Analysis finished: 58.013ms (53.620ms)
       - Analysis finished: 241.455ms (232.775ms)
       - Analysis finished: 1s084ms (1s067ms)
       - Analysis finished: 5s718ms (5s674ms)
       - Analysis finished: 45s177ms (45s107ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/fragment-state.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
A testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
M tests/query_test/test_codegen.py
15 files changed, 174 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13645/6
-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#9) to the change originally created by Abhishek Rawat. ( http://gerrit.cloudera.org:8080/13645 )

Change subject: WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
......................................................................

WIP IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Expression rewrites for VALUES() could result in performance regression
since there is virtually no benefit of rewrite, if the expression will
only ever be evaluated once. The overhead of rewrites in some cases
could be huge, especially if there are several constant expressions.
The regression also seems to non-linearly increase as number of columns
increases. Similarly, there is no value in doing codegen for such const
expressions.

The rewriteExprs() for ValuesStmt class was overridden with an empty
function body. As a result rewrites for VALUES() is a no-op.

Codegen was disabled for const expressions within a UNION node, if
the UNION node is not within a subplan. This applies to all UNION nodes
with const expressions (and not just limited to UNION nodes associated
with a VALUES clause).

The decision for whether or not to enable codegen for const expressions
in a UNION is made in the planner when a UnionNode is initialized. A new
member 'is_codegen_disabled' was added to the thrift struct TExprNode
for communicating this decision to backend. The Optimizer should take
decisions it can and so it seemed like the right place to disable/enable
codegen. The infrastructure is generic and could be extended in future
to selectively disable codegen for any given expression, if needed.

Testing:
- TODO: add frontend tests for the expression rewrite.
- Added a new e2e test case in tests/query_test/test_codegen.py, which
  tests the different scenarios involving UNION with const expressions.
- Passed exhaustive unit-tests.
- Ran manual tests to validate that the non-linear regression in VALUES
  clause when involving increasing number of columns is no longer seen.
  Results below.

for i in 256 512 1024 2048 4096 8192 16384 32768;
do (echo 'VALUES ('; for x in $(seq $i);
do echo  "cast($x as string),"; done;
echo "NULL); profile;") |
time impala-shell.sh -f /dev/stdin |& grep Analysis; done

Base:
       - Analysis finished: 20.137ms (19.215ms)
       - Analysis finished: 46.275ms (44.597ms)
       - Analysis finished: 119.642ms (116.663ms)
       - Analysis finished: 361.195ms (355.856ms)
       - Analysis finished: 1s277ms (1s266ms)
       - Analysis finished: 5s664ms (5s640ms)
       - Analysis finished: 29s689ms (29s646ms)
       - Analysis finished: 2m (2m)

Test:
       - Analysis finished: 1.868ms (986.520us)
       - Analysis finished: 3.195ms (1.856ms)
       - Analysis finished: 7.332ms (3.484ms)
       - Analysis finished: 13.896ms (8.071ms)
       - Analysis finished: 31.015ms (18.963ms)
       - Analysis finished: 60.157ms (38.125ms)
       - Analysis finished: 113.694ms (67.642ms)
       - Analysis finished: 253.044ms (163.180ms)

Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/slot-ref.cc
M be/src/runtime/fragment-state.h
M common/thrift/Exprs.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/values.test
A testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
M tests/query_test/test_codegen.py
16 files changed, 189 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13645/9
-- 
To view, visit http://gerrit.cloudera.org:8080/13645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
Gerrit-Change-Number: 13645
Gerrit-PatchSet: 9
Gerrit-Owner: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>