You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tianyi Wang (Code Review)" <ge...@cloudera.org> on 2018/04/17 18:09:30 UTC

[Impala-ASF-CR](2.x) IMPALA-6822: Add a query option to control shuffling by distinct exprs

Hello Impala Public Jenkins,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/10087

to review the following change.


Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs
......................................................................

IMPALA-6822: Add a query option to control shuffling by distinct exprs

IMPALA-4794 changed the distinct aggregation behavior to shuffling by
both grouping exprs and the distinct expr. It's slower in queries
where the NDVs of grouping exprs are high and data are uniformly
distributed among groups. This patch adds a query option controlling
this behavior, letting users switch to the old plan.

Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Reviewed-on: http://gerrit.cloudera.org:8080/9949
Reviewed-by: Tianyi Wang <tw...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
(cherry picked from commit 9a751f00b8a399116c12a81e130a696b01eb1ba8)
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/distinct.test
M tests/query_test/test_aggregation.py
9 files changed, 454 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/10087/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 10087
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR](2.x) IMPALA-6822: Add a query option to control shuffling by distinct exprs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10087 )

Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs
......................................................................

IMPALA-6822: Add a query option to control shuffling by distinct exprs

IMPALA-4794 changed the distinct aggregation behavior to shuffling by
both grouping exprs and the distinct expr. It's slower in queries
where the NDVs of grouping exprs are high and data are uniformly
distributed among groups. This patch adds a query option controlling
this behavior, letting users switch to the old plan.

Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Reviewed-on: http://gerrit.cloudera.org:8080/9949
Reviewed-by: Tianyi Wang <tw...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
(cherry picked from commit 9a751f00b8a399116c12a81e130a696b01eb1ba8)
Reviewed-on: http://gerrit.cloudera.org:8080/10087
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/distinct.test
M tests/query_test/test_aggregation.py
9 files changed, 454 insertions(+), 24 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/10087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: merged
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 10087
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR](2.x) IMPALA-6822: Add a query option to control shuffling by distinct exprs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10087 )

Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs
......................................................................


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2319/


-- 
To view, visit http://gerrit.cloudera.org:8080/10087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 10087
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Apr 2018 22:28:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6822: Add a query option to control shuffling by distinct exprs

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/10087 )

Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 10087
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Apr 2018 18:17:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6822: Add a query option to control shuffling by distinct exprs

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/10087 )

Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs
......................................................................


Patch Set 1:

Job failed on pypi stuff. Rebasing.


-- 
To view, visit http://gerrit.cloudera.org:8080/10087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 10087
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Apr 2018 22:28:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6822: Add a query option to control shuffling by distinct exprs

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/10087 )

Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs
......................................................................


Patch Set 2: Code-Review+2

Rebased to get PyPi fix ( https://gerrit.cloudera.org/#/c/10093/ ). Carry +2.


-- 
To view, visit http://gerrit.cloudera.org:8080/10087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 10087
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Apr 2018 03:03:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6822: Add a query option to control shuffling by distinct exprs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10087 )

Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2326/


-- 
To view, visit http://gerrit.cloudera.org:8080/10087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 10087
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Apr 2018 03:04:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6822: Add a query option to control shuffling by distinct exprs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10087 )

Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2319/


-- 
To view, visit http://gerrit.cloudera.org:8080/10087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 10087
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Apr 2018 18:19:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6822: Add a query option to control shuffling by distinct exprs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10087 )

Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/10087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc
Gerrit-Change-Number: 10087
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Apr 2018 06:55:38 +0000
Gerrit-HasComments: No