You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/07/03 00:33:00 UTC
[jira] [Resolved] (IMPALA-5547) Improve join cardinality estimation
with a more robust FK/PK detection
[ https://issues.apache.org/jira/browse/IMPALA-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Behm resolved IMPALA-5547.
------------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.10.0
commit 9f678a74269250bf5c7ae2c5e8afd93c5b3734de
Author: Alex Behm <al...@cloudera.com>
Date: Tue Jun 6 16:54:41 2017 -0700
IMPALA-5547: Rework FK/PK join detection.
Reworks the FK/PK join detection logic to:
- more accurately recognize many-to-many joins
- avoid dim/dim joins for multi-column PKs
The new detection logic maintains our existing philosophy of generally
assuming a FK/PK join, unless there is strong evidence to the
contrary, as follows.
For each set of simple equi-join conjuncts between two tables, we
compute the joint NDV of the right-hand side columns by
multiplication, and if the joint NDV is significantly smaller than
the right-hand side row count, then we are fairly confident that the
right-hand side is not a PK. Otherwise, we assume the set of conjuncts
could represent a FK/PK relationship.
Extends the explain plan to include the outcome of the FK/PK detection
at EXPLAIN_LEVEL > STANDARD.
Performance testing:
1. Full TPC-DS run on 10TB:
- Q10 improved by >100x
- Q72 improved by >25x
- Q17,Q26,Q29 improved by 2x
- Q64 regressed by 10x
- Total runtime: Improved by 2x
- Geomean: Minor improvement
The regression of Q64 is understood and we will try to address it
in follow-on changes. The previous plan was better by accident and
not because of superior logic.
2. Nightly TPC-H and TPC-DS runs:
- No perf differences
Testing:
- The existing planner test cover the changes.
- Code/hdfs run passed.
Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Reviewed-on: http://gerrit.cloudera.org:8080/7257
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
> Improve join cardinality estimation with a more robust FK/PK detection
> ----------------------------------------------------------------------
>
> Key: IMPALA-5547
> URL: https://issues.apache.org/jira/browse/IMPALA-5547
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
> Reporter: Alexander Behm
> Assignee: Alexander Behm
> Priority: Critical
> Labels: performance, planner, tpc-ds
> Fix For: Impala 2.10.0
>
>
> This JIRA is for tracking improvements to our join-cardinality estimation. In particular, we should improve the handling of many-to-many joins and multi-column joins. It is understood that some cases cannot be reliably detected with our limited metadata and statistics, but we should try our best given those limitations.
> Further improving cardinality estimation with new metadata and statistics is tracked elsewhere, e.g.:
> IMPALA-2416
> IMPALA-3531
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)