You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2019/02/19 21:44:39 UTC
[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12519
Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................
IMPALA-8214: Fix bad plan in load_nested.py
The previous plan had the larger input on the build side of the join and
did a broadcast join, which is very suboptimal.
This speeds up data loading on my minicluster - 18s vs 31s and has a
more significant impact on a real cluster, where queries execute
much faster, the memory requirement is significantly reduced and
the data loading can potentially be broken up into fewer chunks.
I also considered computing stats on the table to let Impala generate
the same plan, but this achieves the same goal more efficiently.
Testing:
Run core tests. Resource estimates in planner tests changed slightly
because of the different distribution of data.
Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
---
M testdata/bin/load_nested.py
M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
3 files changed, 12 insertions(+), 12 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/12519/3
--
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12519 )
Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................
Patch Set 4: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Feb 2019 01:43:21 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12519 )
Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................
Patch Set 4: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Feb 2019 05:46:46 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/12519 )
Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................
Patch Set 3: Code-Review+2
Thanks for taking this on. Looks good.
--
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Feb 2019 01:22:47 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12519 )
Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................
IMPALA-8214: Fix bad plan in load_nested.py
The previous plan had the larger input on the build side of the join and
did a broadcast join, which is very suboptimal.
This speeds up data loading on my minicluster - 18s vs 31s and has a
more significant impact on a real cluster, where queries execute
much faster, the memory requirement is significantly reduced and
the data loading can potentially be broken up into fewer chunks.
I also considered computing stats on the table to let Impala generate
the same plan, but this achieves the same goal more efficiently.
Testing:
Run core tests. Resource estimates in planner tests changed slightly
because of the different distribution of data.
Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Reviewed-on: http://gerrit.cloudera.org:8080/12519
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M testdata/bin/load_nested.py
M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
3 files changed, 12 insertions(+), 12 deletions(-)
Approvals:
Impala Public Jenkins: Looks good to me, approved; Verified
--
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12519 )
Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................
Patch Set 3:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/2160/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 19 Feb 2019 22:29:39 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8214: Fix bad plan in load nested.py
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12519 )
Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................
Patch Set 4:
Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3798/ DRY_RUN=false
--
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Feb 2019 01:43:22 +0000
Gerrit-HasComments: No