You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Paul Rogers (Code Review)" <ge...@cloudera.org> on 2019/03/08 20:31:07 UTC

[Impala-ASF-CR] IMPALA-8014: Revise FK/PK cardinality estimation

Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12535

to look at the new patch set (#5).

Change subject: IMPALA-8014: Revise FK/PK cardinality estimation
......................................................................

IMPALA-8014: Revise FK/PK cardinality estimation

Impala uses two techniques to estimate join cardinality: one for FK/PK,
another for the "generic" case. (IMPALA-8018 suggests we combine the
two. But, for the purposes of this fix, we leave them separate.)

The code for the FK/PK estimation is mostly right for a key that
consists of a singe column. But, if a key is compound (contains more
than one column), the calculations can produce incorrect results.

This patch modifies the code to use the standard join calculation:
|join| = |left| * |right| / max(|left key|, |right key|).

The math for compound keys had a number of errors. Since no test tables
represent a proper compound key, all tests use unrealistic keys such as
a unique Id along with a second, superfluous column. To make the math
come out OK for these, the code included adjustments that misfired on
real-world keys. This patch fixes those errors. A separate patch
introduces new tables with the required schema.

Tests: Many planner tests were modified to reflect the new join
cardinality estimate. Some plans changed, including some TPC-H plans.
Inspected each change to verify that the numbers are correct using the
formula cited above.

Turns out that the change caused a spill test to fail. The new plan is
more efficient and didn't require as much memory. Shrank the memory
given to the query to force the newer, more efficient plan to also
spill.

Testing was also done using a production query that misfired without
this fix, but that produced correct results with this fix.

Change-Id: Id0db82564596b0a9271b89b8e3f7a3cf92d306da
---
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-multi-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
17 files changed, 1,920 insertions(+), 1,829 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/12535/5
-- 
To view, visit http://gerrit.cloudera.org:8080/12535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0db82564596b0a9271b89b8e3f7a3cf92d306da
Gerrit-Change-Number: 12535
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>