You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2018/02/23 21:14:00 UTC

[jira] [Created] (IMPALA-6575) Avoid double-counting of predicates in join cardinality estimation

Alexander Behm created IMPALA-6575:
--------------------------------------

             Summary: Avoid double-counting of predicates in join cardinality estimation
                 Key: IMPALA-6575
                 URL: https://issues.apache.org/jira/browse/IMPALA-6575
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 2.11.0, Impala 2.10.0, Impala 2.9.0, Impala 2.8.0, Impala 2.7.0, Impala 2.6.0, Impala 2.5.0
            Reporter: Alexander Behm


The cardinality of an inner join may be significantly underestimated if (1) an equivalent predicate exists on both join inputs, (2) the join condition involves the same column as that predicate, and (3) Impala believes the join to be FK/PK.

The reason for this underestimation is that the planner double-counts the selectivity of predicates on the join input:
* First, the selectivity reduces the cardinality of the join input
* Second, since the join is FK/PK, the build-side selectivity is applied to the join cardinality
This second adjustment is not correct in this specific situation because the predicate selectivity has already been applied to the probe-side join input.

Example:
{code}
explain select count(*) from functional.alltypes a join functional.alltypes b on a.id = b.id and a.id < 10 and b.id < 10;
+------------------------------------------------------------------------------------------------+
| Explain String                                                                                 |
+------------------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=4.00MB                                               |
| Per-Host Resource Estimates: Memory=279.94MB                                                   |
| Codegen disabled by planner                                                                    |
|                                                                                                |
| F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                          |
| |  Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B                                 |
| PLAN-ROOT SINK                                                                                 |
| |  mem-estimate=0B mem-reservation=0B                                                          |
| |                                                                                              |
| 07:AGGREGATE [FINALIZE]                                                                        |
| |  output: count:merge(*)                                                                      |
| |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB                                 |
| |  tuple-ids=2 row-size=8B cardinality=1                                                       |
| |                                                                                              |
| 06:EXCHANGE [UNPARTITIONED]                                                                    |
| |  mem-estimate=0B mem-reservation=0B                                                          |
| |  tuple-ids=2 row-size=8B cardinality=1                                                       |
| |                                                                                              |
| F02:PLAN FRAGMENT [HASH(a.id)] hosts=3 instances=3                                             |
| Per-Host Resources: mem-estimate=12.94MB mem-reservation=2.94MB runtime-filters-memory=1.00MB  |
| 03:AGGREGATE                                                                                   |
| |  output: count(*)                                                                            |
| |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB                                 |
| |  tuple-ids=2 row-size=8B cardinality=1                                                       |
| |                                                                                              |
| 02:HASH JOIN [INNER JOIN, PARTITIONED]                                                         |
| |  hash predicates: a.id = b.id                                                                |
| |  fk/pk conjuncts: a.id = b.id                                                                |
| |  runtime filters: RF000[bloom] <- b.id                                                       |
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB                             |
| |  tuple-ids=0,1 row-size=8B cardinality=73       <--- should be 730                                             |
| |                                                                                              |
| |--05:EXCHANGE [HASH(b.id)]                                                                    |
| |  |  mem-estimate=0B mem-reservation=0B                                                       |
| |  |  tuple-ids=1 row-size=4B cardinality=730                                                  |
| |  |                                                                                           |
| |  F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                              |
| |  Per-Host Resources: mem-estimate=128.00MB mem-reservation=32.00KB                           |
| |  01:SCAN HDFS [functional.alltypes b, RANDOM]                                                |
| |     partitions=24/24 files=24 size=478.45KB                                                  |
| |     predicates: b.id < 10                                                                    |
| |     stored statistics:                                                                       |
| |       table: rows=7300 size=478.45KB                                                         |
| |       partitions: 24/24 rows=7300                                                            |
| |       columns: all                                                                           |
| |     extrapolated-rows=disabled                                                               |
| |     parquet dictionary predicates: b.id < 10                                                 |
| |     mem-estimate=128.00MB mem-reservation=32.00KB                                            |
| |     tuple-ids=1 row-size=4B cardinality=730                                                  |
| |                                                                                              |
| 04:EXCHANGE [HASH(a.id)]                                                                       |
| |  mem-estimate=0B mem-reservation=0B                                                          |
| |  tuple-ids=0 row-size=4B cardinality=730                                                     |
| |                                                                                              |
| F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                                 |
| Per-Host Resources: mem-estimate=129.00MB mem-reservation=1.03MB runtime-filters-memory=1.00MB |
| 00:SCAN HDFS [functional.alltypes a, RANDOM]                                                   |
|    partitions=24/24 files=24 size=478.45KB                                                     |
|    predicates: a.id < 10                                                                       |
|    runtime filters: RF000[bloom] -> a.id                                                       |
|    stored statistics:                                                                          |
|      table: rows=7300 size=478.45KB                                                            |
|      partitions: 24/24 rows=7300                                                               |
|      columns: all                                                                              |
|    extrapolated-rows=disabled                                                                  |
|    parquet dictionary predicates: a.id < 10                                                    |
|    mem-estimate=128.00MB mem-reservation=32.00KB                                               |
|    tuple-ids=0 row-size=4B cardinality=730                                                     |
+------------------------------------------------------------------------------------------------+
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)