You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/07/03 00:33:00 UTC

[jira] [Resolved] (IMPALA-5547) Improve join cardinality estimation with a more robust FK/PK detection

     [ https://issues.apache.org/jira/browse/IMPALA-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Behm resolved IMPALA-5547.
------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0

commit 9f678a74269250bf5c7ae2c5e8afd93c5b3734de
Author: Alex Behm <al...@cloudera.com>
Date:   Tue Jun 6 16:54:41 2017 -0700

    IMPALA-5547: Rework FK/PK join detection.
    
    Reworks the FK/PK join detection logic to:
    - more accurately recognize many-to-many joins
    - avoid dim/dim joins for multi-column PKs
    
    The new detection logic maintains our existing philosophy of generally
    assuming a FK/PK join, unless there is strong evidence to the
    contrary, as follows.
    
    For each set of simple equi-join conjuncts between two tables, we
    compute the joint NDV of the right-hand side columns by
    multiplication, and if the joint NDV is significantly smaller than
    the right-hand side row count, then we are fairly confident that the
    right-hand side is not a PK. Otherwise, we assume the set of conjuncts
    could represent a FK/PK relationship.
    
    Extends the explain plan to include the outcome of the FK/PK detection
    at EXPLAIN_LEVEL > STANDARD.
    
    Performance testing:
    1. Full TPC-DS run on 10TB:
       - Q10 improved by >100x
       - Q72 improved by >25x
       - Q17,Q26,Q29 improved by 2x
       - Q64 regressed by 10x
       - Total runtime: Improved by 2x
       - Geomean: Minor improvement
       The regression of Q64 is understood and we will try to address it
       in follow-on changes. The previous plan was better by accident and
       not because of superior logic.
    2. Nightly TPC-H and TPC-DS runs:
       - No perf differences
    
    Testing:
    - The existing planner test cover the changes.
    - Code/hdfs run passed.
    
    Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
    Reviewed-on: http://gerrit.cloudera.org:8080/7257
    Reviewed-by: Alex Behm <al...@cloudera.com>
    Tested-by: Impala Public Jenkins


> Improve join cardinality estimation with a more robust FK/PK detection
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-5547
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5547
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
>            Reporter: Alexander Behm
>            Assignee: Alexander Behm
>            Priority: Critical
>              Labels: performance, planner, tpc-ds
>             Fix For: Impala 2.10.0
>
>
> This JIRA is for tracking improvements to our join-cardinality estimation. In particular, we should improve the handling of many-to-many joins and multi-column joins. It is understood that some cases cannot be reliably detected with our limited metadata and statistics, but we should try our best given those limitations.
> Further improving cardinality estimation with new metadata and statistics is tracked elsewhere, e.g.:
> IMPALA-2416
> IMPALA-3531



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)