You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/10/24 00:21:00 UTC
[jira] [Created] (IMPALA-7749) Merge aggregation node memory
estimate is incorrectly influenced by limit
Tim Armstrong created IMPALA-7749:
-------------------------------------
Summary: Merge aggregation node memory estimate is incorrectly influenced by limit
Key: IMPALA-7749
URL: https://issues.apache.org/jira/browse/IMPALA-7749
Project: IMPALA
Issue Type: Sub-task
Components: Frontend
Affects Versions: Impala 2.12.0, Impala 3.0, Impala 2.11.0, Impala 3.1.0
Reporter: Tim Armstrong
Assignee: Bikramjeet Vig
In the below query the estimate for node ID 3 is too low. If you remove the limit it is correct.
{noformat}
[localhost:21000] default> set explain_level=2; explain select l_orderkey, l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5;
EXPLAIN_LEVEL set to 2
Query: explain select l_orderkey, l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5
+-------------------------------------------------------------------------------------------+
| Explain String |
+-------------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=43.94MB Threads=4 |
| Per-Host Resource Estimates: Memory=450MB |
| |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
| | Per-Host Resources: mem-estimate=0B mem-reservation=0B thread-reservation=1 |
| PLAN-ROOT SINK |
| | mem-estimate=0B mem-reservation=0B thread-reservation=0 |
| | |
| 04:EXCHANGE [UNPARTITIONED] |
| | limit: 5 |
| | mem-estimate=0B mem-reservation=0B thread-reservation=0 |
| | tuple-ids=1 row-size=28B cardinality=5 |
| | in pipelines: 03(GETNEXT) |
| | |
| F01:PLAN FRAGMENT [HASH(l_orderkey,l_partkey,l_linenumber)] hosts=3 instances=3 |
| Per-Host Resources: mem-estimate=10.00MB mem-reservation=1.94MB thread-reservation=1 |
| 03:AGGREGATE [FINALIZE] |
| | output: count:merge(*) |
| | group by: l_orderkey, l_partkey, l_linenumber |
| | limit: 5 |
| | mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0 |
| | tuple-ids=1 row-size=28B cardinality=5 |
| | in pipelines: 03(GETNEXT), 00(OPEN) |
| | |
| 02:EXCHANGE [HASH(l_orderkey,l_partkey,l_linenumber)] |
| | mem-estimate=0B mem-reservation=0B thread-reservation=0 |
| | tuple-ids=1 row-size=28B cardinality=6001215 |
| | in pipelines: 00(GETNEXT) |
| | |
| F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 |
| Per-Host Resources: mem-estimate=440.27MB mem-reservation=42.00MB thread-reservation=2 |
| 01:AGGREGATE [STREAMING] |
| | output: count(*) |
| | group by: l_orderkey, l_partkey, l_linenumber |
| | mem-estimate=176.27MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0 |
| | tuple-ids=1 row-size=28B cardinality=6001215 |
| | in pipelines: 00(GETNEXT) |
| | |
| 00:SCAN HDFS [tpch.lineitem, RANDOM] |
| partitions=1/1 files=1 size=718.94MB |
| stored statistics: |
| table: rows=6001215 size=718.94MB |
| columns: all |
| extrapolated-rows=disabled max-scan-range-rows=1068457 |
| mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1 |
| tuple-ids=0 row-size=20B cardinality=6001215 |
| in pipelines: 00(GETNEXT) |
+-------------------------------------------------------------------------------------------+
{noformat}
The bug is that we use cardinality_ to cap the number of distinct values, but cardinality_ is capped at the output limit.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org