You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2019/02/19 21:44:39 UTC

[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py

Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12519


Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................

IMPALA-8214: Fix bad plan in load_nested.py

The previous plan had the larger input on the build side of the join and
did a broadcast join, which is very suboptimal.

This speeds up data loading on my minicluster - 18s vs 31s and has a
more significant impact on a real cluster, where queries execute
much faster, the memory requirement is significantly reduced and
the data loading can potentially be broken up into fewer chunks.

I also considered computing stats on the table to let Impala generate
the same plan, but this achieves the same goal more efficiently.

Testing:
Run core tests. Resource estimates in planner tests changed slightly
because of the different distribution of data.

Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
---
M testdata/bin/load_nested.py
M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
3 files changed, 12 insertions(+), 12 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/12519/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12519 )

Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Feb 2019 01:43:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12519 )

Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Feb 2019 05:46:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/12519 )

Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................


Patch Set 3: Code-Review+2

Thanks for taking this on. Looks good.


-- 
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Feb 2019 01:22:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12519 )

Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................

IMPALA-8214: Fix bad plan in load_nested.py

The previous plan had the larger input on the build side of the join and
did a broadcast join, which is very suboptimal.

This speeds up data loading on my minicluster - 18s vs 31s and has a
more significant impact on a real cluster, where queries execute
much faster, the memory requirement is significantly reduced and
the data loading can potentially be broken up into fewer chunks.

I also considered computing stats on the table to let Impala generate
the same plan, but this achieves the same goal more efficiently.

Testing:
Run core tests. Resource estimates in planner tests changed slightly
because of the different distribution of data.

Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Reviewed-on: http://gerrit.cloudera.org:8080/12519
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M testdata/bin/load_nested.py
M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
3 files changed, 12 insertions(+), 12 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12519 )

Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2160/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 19 Feb 2019 22:29:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12519 )

Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3798/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Feb 2019 01:43:22 +0000
Gerrit-HasComments: No