You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2018/05/18 19:33:00 UTC
[jira] [Created] (HIVE-19607) Pushing Aggregates on Top of
Aggregates
slim bouguerra created HIVE-19607:
-------------------------------------
Summary: Pushing Aggregates on Top of Aggregates
Key: HIVE-19607
URL: https://issues.apache.org/jira/browse/HIVE-19607
Project: Hive
Issue Type: Sub-task
Reporter: slim bouguerra
Fix For: 3.1.0
This plan shows an instance where the count aggregates can be pushed to Druid which will eliminate the last stage reducer.
{code}
+PREHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM druid_table
+PREHOOK: type: QUERY
+POSTHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM druid_table
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Tez
+#### A masked pattern was here ####
+ Edges:
+ Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+#### A masked pattern was here ####
+ Vertices:
+ Map 1
+ Map Operator Tree:
+ TableScan
+ alias: druid_table
+ properties:
+ druid.fieldNames cstring2,$f1
+ druid.fieldTypes string,double
+ druid.query.json {"queryType":"groupBy","dataSource":"default.druid_table","granularity":"all","dimensions":[{"type":"default","dimension":"cstring2","outputName":"cstring2","outputType":"STRING"}],"limitSpec":{"type":"default"},"aggregations":[{"type":"doubleSum","name":"$f1","fieldName":"cdouble"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]}
+ druid.query.type groupBy
+ Statistics: Num rows: 9173 Data size: 1673472 Basic stats: COMPLETE Column stats: NONE
+ Select Operator
+ expressions: cstring2 (type: string), $f1 (type: double)
+ outputColumnNames: cstring2, $f1
+ Statistics: Num rows: 9173 Data size: 1673472 Basic stats: COMPLETE Column stats: NONE
+ Group By Operator
+ aggregations: count(cstring2), sum($f1)
+ mode: hash
+ outputColumnNames: _col0, _col1
+ Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
+ Reduce Output Operator
+ sort order:
+ Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
+ value expressions: _col0 (type: bigint), _col1 (type: double)
+ Reducer 2
+ Reduce Operator Tree:
+ Group By Operator
+ aggregations: count(VALUE._col0), sum(VALUE._col1)
+ mode: mergepartial
+ outputColumnNames: _col0, _col1
+ Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE
+ table:
+ input format: org.apache.hadoop.mapred.SequenceFileInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)