You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/02/28 06:11:00 UTC

[jira] [Created] (IMPALA-8262) Join cardinality not decreased by join filter selectivity

Paul Rogers created IMPALA-8262:
-----------------------------------

             Summary: Join cardinality not decreased by join filter selectivity
                 Key: IMPALA-8262
                 URL: https://issues.apache.org/jira/browse/IMPALA-8262
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.1.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers


Consider a subset of the plan for TPC-H query 7. (See {{tpch-all.test}} for details.)

{noformat}
11:AGGREGATE [FINALIZE]
|  output: sum(l_extendedprice * (1 - l_discount))
|  group by: n1.n_name, n2.n_name, year(l_shipdate)
|  row-size=58B cardinality=575.77K
|
10:HASH JOIN [INNER JOIN]
|  hash predicates: c_nationkey = n2.n_nationkey
|  other predicates: ((n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY') OR (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE'))
|  row-size=132B cardinality=575.77K
|
|--05:SCAN HDFS [tpch.nation n2]
|     row-size=21B cardinality=25
|
09:HASH JOIN [INNER JOIN]
|  hash predicates: s_nationkey = n1.n_nationkey
|  row-size=111B cardinality=575.77K
{noformat}

Here, we have join 09 feeding 576K rows into join 10. All 576K rows pass along to the aggregate 11. Notice, however, that join 10 has a that picks out 2 of the 25 countries in each of two paths. The selectivity of the filters should be something like 2 * 2/25 = 0.16. Thus, the output cardinality of the 10 join should be 577K * 0.16 = 92K.

The problem is that the join cardinality calculations don't consider join filter selectivity.

It may be that this was done to handle the outer join case, in which filters applied in the outer-side scan must be re-applied on the join. Omitting the filters avoids duplicate accounting for the selectivity.

But, that case is special and should be handled specially as part of IMPALA-8213. Except for correlated filters, the planner *should* apply join filter selectivity to the join output cardinality calculations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)