You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2014/05/31 04:20:01 UTC

[jira] [Created] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups

Gopal V created HIVE-7156:
-----------------------------

             Summary: Group-By operator stat-annotation only uses distinct approx to generate rollups
                 Key: HIVE-7156
                 URL: https://issues.apache.org/jira/browse/HIVE-7156
             Project: Hive
          Issue Type: Bug
            Reporter: Gopal V


The stats annotation for a group-by only annotates the reduce-side row-count with the distinct values.

The map-side gets the row-count as the rows output instead of distinct * parallelism, while the reducer side gets the correct parallelism.

{code}
hive> explain select distinct L_SHIPDATE from lineitem;

      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: lineitem
                  Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: l_shipdate (type: string)
                    outputColumnNames: l_shipdate
                    Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE
                    Group By Operator
                      keys: l_shipdate (type: string)
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Reducer 2 
            Reduce Operator Tree:
              Group By Operator
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: _col0 (type: string)
                  outputColumnNames: _col0
                  Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)