You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sun Rui (JIRA)" <ji...@apache.org> on 2013/12/29 14:39:50 UTC
[jira] [Created] (HIVE-6120) Add GroupBy optimization to eliminate
un-needed partial distinct aggregations
Sun Rui created HIVE-6120:
-----------------------------
Summary: Add GroupBy optimization to eliminate un-needed partial distinct aggregations
Key: HIVE-6120
URL: https://issues.apache.org/jira/browse/HIVE-6120
Project: Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Sun Rui
Assignee: Sun Rui
In most cases, partial distinct aggregation is not needed in map-side groupby. The exception is that with sorted bucketized tables partial distinct aggregation can be done by the mappers in some scenarios, as what is done by GroupByOptimzer.
Currently, partial distinct aggregation is done in the map-side GroupBy and then shuffle of the partial result is done in the following ReduceSink operator, in cases where they are not needed. This wastes CPU cycles, memory and network bandwidth.
This optimization eliminates un-needed partial distinct aggregations, which improves performance and reduces memory usage.
For example,
EXPLAIN SELECT key, count(DISTINCT value) FROM src GROUP BY key;
Before optimization:
{noformat}
Group By Operator
aggregations:
expr: count(DISTINCT value)
bucketGroup: false
keys:
expr: key
type: int
expr: value
type: string
mode: hash
outputColumnNames: _col0, _col1, _col2
Reduce Output Operator
key expressions:
expr: _col0
type: int
expr: _col1
type: string
sort order: ++
Map-reduce partition columns:
expr: _col0
type: int
tag: -1
value expressions:
expr: _col2
type: bigint
{noformat}
After optimization:
{noformat}
Group By Operator
bucketGroup: false
keys:
expr: key
type: int
expr: value
type: string
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
key expressions:
expr: _col0
type: int
expr: _col1
type: string
sort order: ++
Map-reduce partition columns:
expr: _col0
type: int
tag: -1
{noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)