You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/11/07 22:56:00 UTC

[jira] [Commented] (IMPALA-7791) Aggregation Node memory estimates don't account for number of fragment instances

    [ https://issues.apache.org/jira/browse/IMPALA-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678914#comment-16678914 ] 

ASF subversion and git services commented on IMPALA-7791:
---------------------------------------------------------

Commit b48e25689515ffa6a4bfd4c7d8bab8412e85387b in impala's branch refs/heads/master from poojanilangekar
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=b48e256 ]

IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Previously, the AggregationNode calculated the estimated number
of rows based on input cardinality without accounting for the
division of input data across multiple fragment instances. This
bloated up the memory estimates for the node. After this change,
the AggregationNode accounts for the number of fragment instances
while estimating the number of rows per instance. A skew factor of
1.5 was added to account for data skew among multiple fragment
instances. This number was derived using empirical analysis of
real-world and benchmark (tpch, tpcds) queries.

Testing:
Tested queries with changed estimates to avoid cases of
significant underestimation of memory.
Ran front-end and end-to-end tests affected by this change.

Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Reviewed-on: http://gerrit.cloudera.org:8080/11854
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Aggregation Node memory estimates don't account for number of fragment instances
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-7791
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7791
>             Project: IMPALA
>          Issue Type: Sub-task
>    Affects Versions: Impala 3.1.0
>            Reporter: Pooja Nilangekar
>            Assignee: Pooja Nilangekar
>            Priority: Blocker
>
> AggregationNode's memory estimates are calculated based on the input cardinality of the node, without accounting for the division of input data across fragment instances. This results in very high memory estimates. In reality, the nodes often use only a part of this memory.   
> Example query:
> {code:java}
> [localhost:21000] default> select distinct * from tpch.lineitem limit 5; 
> {code}
> Summary: 
> {code:java}
> +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | Operator     | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail                                                                                                                                                                                                                                                                                                                                                                                                                            |
> +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | 04:EXCHANGE  | 1      | 21.24us  | 21.24us  | 5     | 5          | 48.00 KB  | 16.00 KB      | UNPARTITIONED                                                                                                                                                                                                                                                                                                                                                                                                                     |
> | 03:AGGREGATE | 3      | 5.11s    | 5.15s    | 15    | 5          | 576.21 MB | 1.62 GB       | FINALIZE                                                                                                                                                                                                                                                                                                                                                                                                                          |
> | 02:EXCHANGE  | 3      | 709.75ms | 728.91ms | 6.00M | 6.00M      | 5.46 MB   | 10.78 MB      | HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment) |
> | 01:AGGREGATE | 3      | 4.37s    | 4.70s    | 6.00M | 6.00M      | 36.77 MB  | 1.62 GB       | STREAMING                                                                                                                                                                                                                                                                                                                                                                                                                         |
> | 00:SCAN HDFS | 3      | 437.14ms | 480.60ms | 6.00M | 6.00M      | 65.51 MB  | 264.00 MB     | tpch.lineitem                                                                                                                                                                                                                                                                                                                                                                                                                     |
> +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> {code}
> The plan estimates 3.50 GB memory per host but the query ends up with a peak memory usage of 682.07 MB. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org