You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Pooja Nilangekar (JIRA)" <ji...@apache.org> on 2018/10/24 20:43:00 UTC

[jira] [Assigned] (IMPALA-7749) Merge aggregation node memory estimate is incorrectly influenced by limit

     [ https://issues.apache.org/jira/browse/IMPALA-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pooja Nilangekar reassigned IMPALA-7749:
----------------------------------------

    Assignee: Pooja Nilangekar  (was: Bikramjeet Vig)

> Merge aggregation node memory estimate is incorrectly influenced by limit
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-7749
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7749
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>    Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0
>            Reporter: Tim Armstrong
>            Assignee: Pooja Nilangekar
>            Priority: Critical
>
> In the below query the estimate for node ID 3 is too low. If you remove the limit it is correct. 
> {noformat}
> [localhost:21000] default> set explain_level=2; explain select l_orderkey, l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5;
> EXPLAIN_LEVEL set to 2
> Query: explain select l_orderkey, l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5
> +-------------------------------------------------------------------------------------------+
> | Explain String                                                                            |
> +-------------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=43.94MB Threads=4                               |
> | Per-Host Resource Estimates: Memory=450MB                                                 |
> |                                                                                           |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                     |
> | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B thread-reservation=1            |
> | PLAN-ROOT SINK                                                                            |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                                |
> | |                                                                                         |
> | 04:EXCHANGE [UNPARTITIONED]                                                               |
> | |  limit: 5                                                                               |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                                |
> | |  tuple-ids=1 row-size=28B cardinality=5                                                 |
> | |  in pipelines: 03(GETNEXT)                                                              |
> | |                                                                                         |
> | F01:PLAN FRAGMENT [HASH(l_orderkey,l_partkey,l_linenumber)] hosts=3 instances=3           |
> | Per-Host Resources: mem-estimate=10.00MB mem-reservation=1.94MB thread-reservation=1      |
> | 03:AGGREGATE [FINALIZE]                                                                   |
> | |  output: count:merge(*)                                                                 |
> | |  group by: l_orderkey, l_partkey, l_linenumber                                          |
> | |  limit: 5                                                                               |
> | |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0  |
> | |  tuple-ids=1 row-size=28B cardinality=5                                                 |
> | |  in pipelines: 03(GETNEXT), 00(OPEN)                                                    |
> | |                                                                                         |
> | 02:EXCHANGE [HASH(l_orderkey,l_partkey,l_linenumber)]                                     |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                                |
> | |  tuple-ids=1 row-size=28B cardinality=6001215                                           |
> | |  in pipelines: 00(GETNEXT)                                                              |
> | |                                                                                         |
> | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                            |
> | Per-Host Resources: mem-estimate=440.27MB mem-reservation=42.00MB thread-reservation=2    |
> | 01:AGGREGATE [STREAMING]                                                                  |
> | |  output: count(*)                                                                       |
> | |  group by: l_orderkey, l_partkey, l_linenumber                                          |
> | |  mem-estimate=176.27MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0 |
> | |  tuple-ids=1 row-size=28B cardinality=6001215                                           |
> | |  in pipelines: 00(GETNEXT)                                                              |
> | |                                                                                         |
> | 00:SCAN HDFS [tpch.lineitem, RANDOM]                                                      |
> |    partitions=1/1 files=1 size=718.94MB                                                   |
> |    stored statistics:                                                                     |
> |      table: rows=6001215 size=718.94MB                                                    |
> |      columns: all                                                                         |
> |    extrapolated-rows=disabled max-scan-range-rows=1068457                                 |
> |    mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1                      |
> |    tuple-ids=0 row-size=20B cardinality=6001215                                           |
> |    in pipelines: 00(GETNEXT)                                                              |
> +-------------------------------------------------------------------------------------------+
> {noformat}
> The bug is that we use cardinality_ to cap the number of distinct values, but cardinality_ is capped at the output limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org