You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2014/05/31 04:20:01 UTC
[jira] [Created] (HIVE-7156) Group-By operator stat-annotation only
uses distinct approx to generate rollups
Gopal V created HIVE-7156:
-----------------------------
Summary: Group-By operator stat-annotation only uses distinct approx to generate rollups
Key: HIVE-7156
URL: https://issues.apache.org/jira/browse/HIVE-7156
Project: Hive
Issue Type: Bug
Reporter: Gopal V
The stats annotation for a group-by only annotates the reduce-side row-count with the distinct values.
The map-side gets the row-count as the rows output instead of distinct * parallelism, while the reducer side gets the correct parallelism.
{code}
hive> explain select distinct L_SHIPDATE from lineitem;
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: lineitem
Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: l_shipdate (type: string)
outputColumnNames: l_shipdate
Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
keys: l_shipdate (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Reducer 2
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)