You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Xiaoqing Gao (Code Review)" <ge...@cloudera.org> on 2022/09/22 12:41:44 UTC

[Impala-ASF-CR] IMPALA-10891: Avoid hash exchanges in more situations

Xiaoqing Gao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19030


Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................

IMPALA-10891: Avoid hash exchanges in more situations

If the partition keys of parent fragment and child fragment have
an intersecion and the intersection keys have a high cardinality,
We can think parent fragment and child fragment have compatible
partitions.

The high cardinality factor can be set by query option.

Testing:
- Added planner tests in 'PlannerTest'

Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test
10 files changed, 133 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19030/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 1
Gerrit-Owner: Xiaoqing Gao <ga...@gmail.com>

[Impala-ASF-CR] IMPALA-10891: Avoid hash exchanges in more situations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19030 )

Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11407/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 1
Gerrit-Owner: Xiaoqing Gao <ga...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 13:03:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10891: Avoid hash exchanges in more situations

Posted by "Jian Zhang (Code Review)" <ge...@cloudera.org>.
Jian Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19030 )

Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@198
PS3, Line 198: numDistinctValues_ / rootTable.getNumRows()
> Aren't we too strict here? My understanding is that the NDV should be high 
+1. IMHO, instead of checking the factor of NDV/NumRows, a better solution is to check the factor of NDV/NumImpalaBackends since we only need to ensure data can be somehow evenly redistributed after the exchange operator.


http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test
File testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test:

http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test@1
PS3, Line 1: optimize
> Can you mention what did we optimize in this query exactly? My understandin
+1. It would be helpful if there were some micro-benchmark results to show the performance improvement and resource utilization reduction after applying the optimization.



-- 
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 3
Gerrit-Owner: Xiaoqing Gao <ga...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Comment-Date: Mon, 26 Sep 2022 02:36:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10891: Avoid hash exchanges in more situations

Posted by "Xiaoqing Gao (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19030

to look at the new patch set (#2).

Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................

IMPALA-10891: Avoid hash exchanges in more situations

If the partition keys of parent fragment and child fragment have
an intersecion and the intersection keys have a high cardinality,
We can think parent fragment and child fragment have compatible
partitions.

The high cardinality factor can be set by query option.

Testing:
- Added planner tests in 'PlannerTest'

Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test
10 files changed, 133 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19030/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 2
Gerrit-Owner: Xiaoqing Gao <ga...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10891: Avoid hash exchanges in more situations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19030 )

Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11408/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 2
Gerrit-Owner: Xiaoqing Gao <ga...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 13:06:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10891: Avoid hash exchanges in more situations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19030 )

Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19030/3/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/19030/3/be/src/service/query-options.h@280
PS3, Line 280:       enable_shuffle_optimize, ENABLE_SHUFFLE_OPTIMIZE, TQueryOptionLevel::ADVANCED);     \
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 3
Gerrit-Owner: Xiaoqing Gao <ga...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 12:55:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10891: Avoid hash exchanges in more situations

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19030 )

Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19030/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19030/3//COMMIT_MSG@8
PS3, Line 8: 
High level design comments: 

I think that the changes should be mainly in the planner, not in the analyzer. 

Deciding which expressions should be used in an exchange node's hash key before join nodes could be done when these exchange nodes are created: https://github.com/apache/impala/blob/296e94411d3344e2969d4b083036ff238e80ad19/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L426

Deciding when to create an extra partitioned fragment for group by is done here: https://github.com/apache/impala/blob/296e94411d3344e2969d4b083036ff238e80ad19/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L910
We could accept it as "good enough" partitioning if the child fragment's partitioning is a subset of the group by and has a high enough NDV


http://gerrit.cloudera.org:8080/#/c/19030/3//COMMIT_MSG@16
PS3, Line 16: Testing
Did you run the full test suite? You can run the tests on a review with https://jenkins.impala.io/job/gerrit-verify-dryrun-external/


http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@127
PS3, Line 127: partitionExprs_
It seems very counter intuitive to me to change a member in a "get" function.


http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@197
PS3, Line 197:  != -1
0 could be also checked to avoid division by 0


http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@198
PS3, Line 198: numDistinctValues_ / rootTable.getNumRows()
Aren't we too strict here? My understanding is that the NDV should be high enough to spread the rows ~evenly among fragment instances. For example an ndv of 100 should be enough to spread to rows evenly among 2 instances even if the number of rows is 1000


http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test
File testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test:

http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test@1
PS3, Line 1: optimize
Can you mention what did we optimize in this query exactly? My understanding is that we have a cheaper hash function in node 02 (hash id vs hash id, bool_col).


http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test@24
PS3, Line 24: optimize
Can you mention what did we optimize in this query exactly? My understanding is that we have a cheaper hash function in node 04/05 (hash id vs hash id, bigint_col) and skipping a preagregate + shuffle nodes between 02 and 03


http://gerrit.cloudera.org:8080/#/c/19030/3/testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test@56
PS3, Line 56: ====
Are there other nodes modified by this change than join and grouping aggregate?



-- 
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 3
Gerrit-Owner: Xiaoqing Gao <ga...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Sep 2022 18:58:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10891: Avoid hash exchanges in more situations

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19030 )

Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@122
PS3, Line 122:         highCardinalityExpr = groupingExpr;
             :         break;
> I am a bit concerned about the simplicity of the logic - what will happen i
Thought a bit more about this, I think that some change will be needed in the distributed planner to get a correct implementation. Somehow the partitioning expression set of the parent must get to the child to get the intersection of them and check this intersection's NDV.

Currently we decide the partitioning of the children before the parents, as the distributed planner creates a child fragments first: https://github.com/apache/impala/blob/296e94411d3344e2969d4b083036ff238e80ad19/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L109

We could get the partitioning expressions from the parent and give it as argument to the createPlanFragment() of the child. Then the child could create the intersection with its own partitioning expressions.



-- 
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 3
Gerrit-Owner: Xiaoqing Gao <ga...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Sep 2022 20:05:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10891: Avoid hash exchanges in more situations

Posted by "Xiaoqing Gao (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19030

to look at the new patch set (#3).

Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................

IMPALA-10891: Avoid hash exchanges in more situations

If the partition keys of parent fragment and child fragment have
an intersecion and the intersection keys have a high cardinality,
We can think parent fragment and child fragment have compatible
partitions.

The high cardinality factor can be set by query option.

Testing:
- Added planner tests in 'PlannerTest'

Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/optimize_shuffle.test
10 files changed, 132 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19030/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 3
Gerrit-Owner: Xiaoqing Gao <ga...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10891: Avoid hash exchanges in more situations

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19030 )

Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11409/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 3
Gerrit-Owner: Xiaoqing Gao <ga...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 13:13:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10891: Avoid hash exchanges in more situations

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19030 )

Change subject: IMPALA-10891: Avoid hash exchanges in more situations
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/19030/3/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@122
PS3, Line 122:         highCardinalityExpr = groupingExpr;
             :         break;
I am a bit concerned about the simplicity of the logic - what will happen if there are several expression and we find the wrong one with high cardinality? So we return col_1, but the joins before the aggregate create partitions by col_2.



-- 
To view, visit http://gerrit.cloudera.org:8080/19030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie627d07b729f1f584f589cd0251dfe2bb64a417f
Gerrit-Change-Number: 19030
Gerrit-PatchSet: 3
Gerrit-Owner: Xiaoqing Gao <ga...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Sep 2022 19:08:19 +0000
Gerrit-HasComments: Yes