You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2019/08/27 23:52:00 UTC
[jira] [Comment Edited] (IMPALA-2945) Aggregation cardinality estimates do not take into account data distribution

    [ https://issues.apache.org/jira/browse/IMPALA-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917276#comment-16917276 ] 

Tim Armstrong edited comment on IMPALA-2945 at 8/27/19 11:51 PM:
-----------------------------------------------------------------

I looked at this and it looks like the general problem is that the cardinality estimates just don't take into account data distribution for preaggregations at all, but the memory estimates divide by # of hosts.
{noformat}
[localhost:21000] tpch_parquet> show column stats lineitem;
Query: show column stats lineitem
+-----------------+---------------+------------------+--------+----------+-------------------+
| Column          | Type          | #Distinct Values | #Nulls | Max Size | Avg Size          |
+-----------------+---------------+------------------+--------+----------+-------------------+
| l_orderkey      | BIGINT        | 1563438          | 0      | 8        | 8                 |
| l_partkey       | BIGINT        | 200516           | 0      | 8        | 8                 |
| l_suppkey       | BIGINT        | 9712             | 0      | 8        | 8                 |
| l_linenumber    | INT           | 7                | 0      | 4        | 4                 |
| l_quantity      | DECIMAL(12,2) | 51               | 0      | 8        | 8                 |
| l_extendedprice | DECIMAL(12,2) | 868550           | 0      | 8        | 8                 |
| l_discount      | DECIMAL(12,2) | 11               | 0      | 8        | 8                 |
| l_tax           | DECIMAL(12,2) | 9                | 0      | 8        | 8                 |
| l_returnflag    | STRING        | 3                | 0      | 1        | 1                 |
| l_linestatus    | STRING        | 2                | 0      | 1        | 1                 |
| l_shipdate      | STRING        | 2629             | 0      | 10       | 10                |
| l_commitdate    | STRING        | 2559             | 0      | 10       | 10                |
| l_receiptdate   | STRING        | 2658             | 0      | 10       | 10                |
| l_shipinstruct  | STRING        | 4                | 0      | 17       | 11.99863815307617 |
| l_shipmode      | STRING        | 7                | 0      | 7        | 4.285304546356201 |
| l_comment       | STRING        | 4652622          | 0      | 43       | 26.49416923522949 |
+-----------------+---------------+------------------+--------+----------+-------------------+
Query: explain select sum(l_extendedprice) / 7.0 as avg_yearly
from
  lineitem,
  part
where
  p_partkey = l_partkey
  and p_brand = 'Brand#23'
  and p_container = 'MED BOX'   
  and l_quantity < (
    select
      0.2 * avg(l_quantity)
    from
      lineitem
    where
      l_partkey = p_partkey
  )
+--------------------------------------------------------------------+
| Explain String                                                     |
+--------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=40.75MB Threads=8        |
| Per-Host Resource Estimates: Memory=495MB                          |
|                                                                    |
| PLAN-ROOT SINK                                                     |
| |                                                                  |
| 12:AGGREGATE [FINALIZE]                                            |
| |  output: sum:merge(l_extendedprice)                              |
| |  row-size=16B cardinality=1                                      |
| |                                                                  |
| 11:EXCHANGE [UNPARTITIONED]                                        |
| |                                                                  |
| 06:AGGREGATE                                                       |
| |  output: sum(l_extendedprice)                                    |
| |  row-size=16B cardinality=1                                      |
| |                                                                  |
| 05:HASH JOIN [RIGHT SEMI JOIN, PARTITIONED]                        |
| |  hash predicates: l_partkey = p_partkey                          |
| |  other join predicates: l_quantity < 0.2 * avg(l_quantity)       |
| |  runtime filters: RF000 <- p_partkey                             |
| |  row-size=72B cardinality=29.93K                                 |
| |                                                                  |
| |--10:EXCHANGE [HASH(p_partkey)]                                   |
| |  |                                                               |
| |  04:HASH JOIN [INNER JOIN, BROADCAST]                            |
| |  |  hash predicates: l_partkey = p_partkey                       |
| |  |  runtime filters: RF002 <- p_partkey                          |
| |  |  row-size=72B cardinality=29.93K                              |
| |  |                                                               |
| |  |--09:EXCHANGE [BROADCAST]                                      |
| |  |  |                                                            |
| |  |  01:SCAN HDFS [tpch_parquet.part]                             |
| |  |     HDFS partitions=1/1 files=1 size=6.24MB                   |
| |  |     predicates: p_container = 'MED BOX', p_brand = 'Brand#23' |
| |  |     row-size=48B cardinality=1.00K                            |
| |  |                                                               |
| |  00:SCAN HDFS [tpch_parquet.lineitem]                            |
| |     HDFS partitions=1/1 files=3 size=193.98MB                    |
| |     runtime filters: RF002 -> l_partkey                          |
| |     row-size=24B cardinality=6.00M                               |
| |                                                                  |
| 08:AGGREGATE [FINALIZE]                                            |
| |  output: avg:merge(l_quantity)                                   |
| |  group by: l_partkey                                             |
| |  row-size=16B cardinality=200.52K                                |
| |                                                                  |
| 07:EXCHANGE [HASH(l_partkey)]                                      |
| |                                                                  |
| 03:AGGREGATE [STREAMING]                                           |
| |  output: avg(l_quantity)                                         |
| |  group by: l_partkey                                             |
| |  row-size=16B cardinality=200.52K                                |
| |                                                                  |
| 02:SCAN HDFS [tpch_parquet.lineitem]                               |
|    HDFS partitions=1/1 files=3 size=193.98MB                       |
|    runtime filters: RF000 -> tpch_parquet.lineitem.l_partkey       |
|    row-size=16B cardinality=6.00M                                  |
+--------------------------------------------------------------------+
Fetched 56 row(s) in 0.01s
{noformat}

With the memory-based admission control improvements to set max mem limit and some of the other fixes in IMPALA-7604 it might be worth taking another look. I'm still concerned about two wrongs making a right when we are grouping by multiple keys and overestimating cardinality.

That suggests we could consider coupling this change with another change to address the overestimation when grouping by multiple keys.


was (Author: tarmstrong):
I looked at this and it looks like the general problem is that the cardinality estimates just don't take into account data distribution for preaggregations at all, but the memory estimates divide by # of hosts.
{noformat}
[localhost:21000] tpch_parquet> show column stats lineitem;
Query: show column stats lineitem
+-----------------+---------------+------------------+--------+----------+-------------------+
| Column          | Type          | #Distinct Values | #Nulls | Max Size | Avg Size          |
+-----------------+---------------+------------------+--------+----------+-------------------+
| l_orderkey      | BIGINT        | 1563438          | 0      | 8        | 8                 |
| l_partkey       | BIGINT        | 200516           | 0      | 8        | 8                 |
| l_suppkey       | BIGINT        | 9712             | 0      | 8        | 8                 |
| l_linenumber    | INT           | 7                | 0      | 4        | 4                 |
| l_quantity      | DECIMAL(12,2) | 51               | 0      | 8        | 8                 |
| l_extendedprice | DECIMAL(12,2) | 868550           | 0      | 8        | 8                 |
| l_discount      | DECIMAL(12,2) | 11               | 0      | 8        | 8                 |
| l_tax           | DECIMAL(12,2) | 9                | 0      | 8        | 8                 |
| l_returnflag    | STRING        | 3                | 0      | 1        | 1                 |
| l_linestatus    | STRING        | 2                | 0      | 1        | 1                 |
| l_shipdate      | STRING        | 2629             | 0      | 10       | 10                |
| l_commitdate    | STRING        | 2559             | 0      | 10       | 10                |
| l_receiptdate   | STRING        | 2658             | 0      | 10       | 10                |
| l_shipinstruct  | STRING        | 4                | 0      | 17       | 11.99863815307617 |
| l_shipmode      | STRING        | 7                | 0      | 7        | 4.285304546356201 |
| l_comment       | STRING        | 4652622          | 0      | 43       | 26.49416923522949 |
+-----------------+---------------+------------------+--------+----------+-------------------+
Query: explain select sum(l_extendedprice) / 7.0 as avg_yearly
from
  lineitem,
  part
where
  p_partkey = l_partkey
  and p_brand = 'Brand#23'
  and p_container = 'MED BOX'   
  and l_quantity < (
    select
      0.2 * avg(l_quantity)
    from
      lineitem
    where
      l_partkey = p_partkey
  )
+--------------------------------------------------------------------+
| Explain String                                                     |
+--------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=40.75MB Threads=8        |
| Per-Host Resource Estimates: Memory=495MB                          |
|                                                                    |
| PLAN-ROOT SINK                                                     |
| |                                                                  |
| 12:AGGREGATE [FINALIZE]                                            |
| |  output: sum:merge(l_extendedprice)                              |
| |  row-size=16B cardinality=1                                      |
| |                                                                  |
| 11:EXCHANGE [UNPARTITIONED]                                        |
| |                                                                  |
| 06:AGGREGATE                                                       |
| |  output: sum(l_extendedprice)                                    |
| |  row-size=16B cardinality=1                                      |
| |                                                                  |
| 05:HASH JOIN [RIGHT SEMI JOIN, PARTITIONED]                        |
| |  hash predicates: l_partkey = p_partkey                          |
| |  other join predicates: l_quantity < 0.2 * avg(l_quantity)       |
| |  runtime filters: RF000 <- p_partkey                             |
| |  row-size=72B cardinality=29.93K                                 |
| |                                                                  |
| |--10:EXCHANGE [HASH(p_partkey)]                                   |
| |  |                                                               |
| |  04:HASH JOIN [INNER JOIN, BROADCAST]                            |
| |  |  hash predicates: l_partkey = p_partkey                       |
| |  |  runtime filters: RF002 <- p_partkey                          |
| |  |  row-size=72B cardinality=29.93K                              |
| |  |                                                               |
| |  |--09:EXCHANGE [BROADCAST]                                      |
| |  |  |                                                            |
| |  |  01:SCAN HDFS [tpch_parquet.part]                             |
| |  |     HDFS partitions=1/1 files=1 size=6.24MB                   |
| |  |     predicates: p_container = 'MED BOX', p_brand = 'Brand#23' |
| |  |     row-size=48B cardinality=1.00K                            |
| |  |                                                               |
| |  00:SCAN HDFS [tpch_parquet.lineitem]                            |
| |     HDFS partitions=1/1 files=3 size=193.98MB                    |
| |     runtime filters: RF002 -> l_partkey                          |
| |     row-size=24B cardinality=6.00M                               |
| |                                                                  |
| 08:AGGREGATE [FINALIZE]                                            |
| |  output: avg:merge(l_quantity)                                   |
| |  group by: l_partkey                                             |
| |  row-size=16B cardinality=200.52K                                |
| |                                                                  |
| 07:EXCHANGE [HASH(l_partkey)]                                      |
| |                                                                  |
| 03:AGGREGATE [STREAMING]                                           |
| |  output: avg(l_quantity)                                         |
| |  group by: l_partkey                                             |
| |  row-size=16B cardinality=200.52K                                |
| |                                                                  |
| 02:SCAN HDFS [tpch_parquet.lineitem]                               |
|    HDFS partitions=1/1 files=3 size=193.98MB                       |
|    runtime filters: RF000 -> tpch_parquet.lineitem.l_partkey       |
|    row-size=16B cardinality=6.00M                                  |
+--------------------------------------------------------------------+
Fetched 56 row(s) in 0.01s
{noformat}

With the memory-based admission control improvements to set max mem limit and some of the other fixes in IMPALA-7604 it might be worth taking another look. I'm still concerned about two wrongs making a right when we are grouping by multiple keys and overestimating cardinality.

> Aggregation cardinality estimates do not take into account data distribution
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-2945
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2945
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.0, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
>            Reporter: Mostafa Mokhtar
>            Priority: Major
>              Labels: planner, resource-management
>
> When computing the per-host memory estimate for local aggregations, the planner does not take into account that data is randomly distributed across nodes leading to significant underestimation in some cases. The suggested fix is to use min(agg input cardinality, NDV * #hosts) as the per-node cardinality used for the per-node memory estimate.
> Impact: In the query below, the planner significantly underestimates the per-node memory of agg node 03 to be 3.8GB but the actual is 24.77.
> Query
> {code}
> select sum(l_extendedprice) / 7.0 as avg_yearly
> from
>   lineitem,
>   part
> where
>   p_partkey = l_partkey
>   and p_brand = 'Brand#23'
>   and p_container = 'MED BOX'
>   and l_quantity < (
>     select
>       0.2 * avg(l_quantity)
>     from
>       lineitem
>     where
>       l_partkey = p_partkey
>   )
> {code}
> Plan
> {code}
> 12:AGGREGATE [FINALIZE]
> |  output: sum:merge(l_extendedprice)
> |  hosts=20 per-host-mem=unavailable
> |  tuple-ids=6 row-size=16B cardinality=1
> |
> 11:EXCHANGE [UNPARTITIONED]
> |  hosts=20 per-host-mem=unavailable
> |  tuple-ids=6 row-size=16B cardinality=1
> |
> 06:AGGREGATE
> |  output: sum(l_extendedprice)
> |  hosts=20 per-host-mem=10.00MB
> |  tuple-ids=6 row-size=16B cardinality=1
> |
> 05:HASH JOIN [RIGHT SEMI JOIN, PARTITIONED]
> |  hash predicates: l_partkey = p_partkey
> |  other join predicates: l_quantity < 0.2 * avg(l_quantity)
> |  hosts=20 per-host-mem=125.18MB
> |  tuple-ids=0,1 row-size=80B cardinality=29992141
> |
> |--10:EXCHANGE [HASH(p_partkey)]
> |  |  hosts=20 per-host-mem=0B
> |  |  tuple-ids=0,1 row-size=80B cardinality=29992141
> |  |
> |  04:HASH JOIN [INNER JOIN, BROADCAST]
> |  |  hash predicates: l_partkey = p_partkey
> |  |  hosts=20 per-host-mem=58.30MB
> |  |  tuple-ids=0,1 row-size=80B cardinality=29992141
> |  |
> |  |--09:EXCHANGE [BROADCAST]
> |  |  |  hosts=20 per-host-mem=0B
> |  |  |  tuple-ids=1 row-size=56B cardinality=1000000
> |  |  |
> |  |  01:SCAN HDFS [tpch_1000_decimal_parquet.part, RANDOM]
> |  |     partitions=1/1 files=40 size=6.38GB
> |  |     predicates: p_brand = 'Brand#23', p_container = 'MED BOX'
> |  |     table stats: 200000000 rows total
> |  |     column stats: all
> |  |     hosts=20 per-host-mem=264.00MB
> |  |     tuple-ids=1 row-size=56B cardinality=1000000
> |  |
> |  00:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM]
> |     partitions=1/1 files=880 size=216.61GB
> |     table stats: 5999989709 rows total
> |     column stats: all
> |     hosts=20 per-host-mem=264.00MB
> |     tuple-ids=0 row-size=24B cardinality=5999989709
> |
> 08:AGGREGATE [FINALIZE]
> |  output: avg:merge(l_quantity)
> |  group by: l_partkey
> |  hosts=20 per-host-mem=167.89MB
> |  tuple-ids=4 row-size=16B cardinality=200052064
> |
> 07:EXCHANGE [HASH(l_partkey)]
> |  hosts=20 per-host-mem=0B
> |  tuple-ids=3 row-size=16B cardinality=200052064
> |
> 03:AGGREGATE
> |  output: avg(l_quantity)
> |  group by: l_partkey
> |  hosts=20 per-host-mem=3.28GB
> |  tuple-ids=3 row-size=16B cardinality=200052064
> |
> 02:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM]
>    partitions=1/1 files=880 size=216.61GB
>    table stats: 5999989709 rows total
>    column stats: all
>    hosts=20 per-host-mem=176.00MB
>    tuple-ids=2 row-size=16B cardinality=5999989709
> {code}
> Summary
> |Operator             ||#Hosts||   Avg Time||   Max Time||    #Rows  ||Est. #Rows||   Peak Mem  ||Est. Peak Mem  ||Detail                   |      
> |12:AGGREGATE              |1  |256.620ms|  256.620ms|        1|           1|   92.00 KB|        -1.00 B|  FINALIZE                       |
> |11:EXCHANGE               |1  |184.430us|  184.430us|       20|           1|          0|        -1.00 B|  UNPARTITIONED                  |
> |06:AGGREGATE             |20  |364.045ms|    1s508ms|       20|           1|    9.37 MB|       10.00 MB|                                 |
> |05:HASH JOIN             |20  |279.175ms|  304.600ms|  523.09K|      29.99M|  155.04 MB|      125.18 MB|  RIGHT SEMI JOIN, PARTITIONED   |
> |I--10:EXCHANGE           |20   |22.448ms|   32.954ms|    5.98M|      29.99M|          0|              0|  HASH(p_partkey)                |
> |I  04:HASH JOIN          |20   |25s417ms|   35s579ms|    5.98M|      29.99M|  146.02 MB|       58.30 MB|  INNER JOIN, BROADCAST          |
> |I  I--09:EXCHANGE        |20   |16.270ms|   35.329ms|  199.30K|       1.00M|          0|              0|  BROADCAST                     | 
> |I  I  01:SCAN HDFS       |20  |218.505ms|  331.299ms|  199.30K|       1.00M|  173.43 MB|      264.00 MB|  tpch_1000_decimal_parquet.part| 
> |I  00:SCAN HDFS          |20    |1s365ms|    1s822ms|    6.00B|       6.00B|    1.92 GB|      264.00 MB|  tpch_1000_decimal_parquet.l... |
> |08:AGGREGATE             |20   |29s706ms|   35s917ms|  200.00M|     200.05M|    1.64 GB|      167.89 MB|  FINALIZE|                       
> |07:EXCHANGE              |20    |5s081ms|    8s410ms|    3.11B|     200.05M|          0|              0|  HASH(l_partkey)|                
> |03:AGGREGATE             |20      |4m10s      5m12s |   3.11B|     200.05M|  24.77 GB|        3.28 GB|                                 
> |02:SCAN HDFS             |20    |1s544ms|    2s517ms|    6.00B|       6.00B|  838.85 MB|      176.00 MB|  tpch_1000_decimal_parquet.l... |



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org