You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2017/08/23 23:38:14 UTC
[Impala-ASF-CR] IMPALA-5648: fix count(*) mem estimate regression
Tim Armstrong has uploaded a new patch set (#3).
Change subject: IMPALA-5648: fix count(*) mem estimate regression
......................................................................
IMPALA-5648: fix count(*) mem estimate regression
The metadata-only scan doesn't allocate I/O buffers, contrary to
an assumption of the memory estimation code in the planner.
This fix also sets a floor on the memory estimate, to avoid
estimating 0 bytes. 1MB seems like a reasonable approximation:
I ran metadata-only scans on a few different data sizes and
saw numbers from 128kb to 1mb.
The estimate is now much closer to actual consumption
(it was 80MB before):
[localhost:21000] > select count(*) from tpch_parquet.lineitem; summary;
Query: select count(*) from tpch_parquet.lineitem
Query submitted at: 2017-08-23 11:58:29 (Coordinator: http://tarmstrong-box:25000)
Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=cb4b8d41fc838c9a:c5496ff300000000
+----------+
| count(*) |
+----------+
| 6001215 |
+----------+
Fetched 1 row(s) in 0.13s
+--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+
| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+
| 03:AGGREGATE | 1 | 168.49us | 168.49us | 1 | 1 | 28.00 KB | 10.00 MB | FINALIZE |
| 02:EXCHANGE | 1 | 30.11ms | 30.11ms | 3 | 1 | 0 B | 0 B | UNPARTITIONED |
| 01:AGGREGATE | 3 | 2.05us | 6.14us | 3 | 1 | 20.00 KB | 10.00 MB | |
| 00:SCAN HDFS | 3 | 4.58ms | 4.72ms | 3 | 6.00M | 128.00 KB | 1.00 MB | tpch_parquet.lineitem |
+--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+
Testing:
Updated affected planner tests.
Change-Id: Iaf5c2316bef2afae54a94245c715534ed294f286
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
3 files changed, 21 insertions(+), 10 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7783/3
--
To view, visit http://gerrit.cloudera.org:8080/7783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaf5c2316bef2afae54a94245c715534ed294f286
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>