You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2017/08/23 23:38:14 UTC

[Impala-ASF-CR] IMPALA-5648: fix count(*) mem estimate regression

Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-5648: fix count(*) mem estimate regression
......................................................................

IMPALA-5648: fix count(*) mem estimate regression

The metadata-only scan doesn't allocate I/O buffers, contrary to
an assumption of the memory estimation code in the planner.

This fix also sets a floor on the memory estimate, to avoid
estimating 0 bytes. 1MB seems like a reasonable approximation:
I ran metadata-only scans on a few different data sizes and
saw numbers from 128kb to 1mb.

The estimate is now much closer to actual consumption
(it was 80MB before):

  [localhost:21000] > select count(*) from tpch_parquet.lineitem; summary;
  Query: select count(*) from tpch_parquet.lineitem
  Query submitted at: 2017-08-23 11:58:29 (Coordinator: http://tarmstrong-box:25000)
  Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=cb4b8d41fc838c9a:c5496ff300000000
  +----------+
  | count(*) |
  +----------+
  | 6001215  |
  +----------+
  Fetched 1 row(s) in 0.13s
  +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+
  | Operator     | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail                |
  +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+
  | 03:AGGREGATE | 1      | 168.49us | 168.49us | 1     | 1          | 28.00 KB  | 10.00 MB      | FINALIZE              |
  | 02:EXCHANGE  | 1      | 30.11ms  | 30.11ms  | 3     | 1          | 0 B       | 0 B           | UNPARTITIONED         |
  | 01:AGGREGATE | 3      | 2.05us   | 6.14us   | 3     | 1          | 20.00 KB  | 10.00 MB      |                       |
  | 00:SCAN HDFS | 3      | 4.58ms   | 4.72ms   | 3     | 6.00M      | 128.00 KB | 1.00 MB       | tpch_parquet.lineitem |
  +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------+

Testing:
Updated affected planner tests.

Change-Id: Iaf5c2316bef2afae54a94245c715534ed294f286
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
3 files changed, 21 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7783/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaf5c2316bef2afae54a94245c715534ed294f286
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>