You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2017/12/22 14:52:00 UTC
[jira] [Assigned] (HIVE-18226) handle UDF to double/int over
aggregate
[ https://issues.apache.org/jira/browse/HIVE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
slim bouguerra reassigned HIVE-18226:
-------------------------------------
Assignee: slim bouguerra
> handle UDF to double/int over aggregate
> ---------------------------------------
>
> Key: HIVE-18226
> URL: https://issues.apache.org/jira/browse/HIVE-18226
> Project: Hive
> Issue Type: Sub-task
> Components: Druid integration
> Reporter: slim bouguerra
> Assignee: slim bouguerra
>
> In cases like the following query Hive planner adds extra UDFtoDouble over integer columns.
> This kind of udf can be pushed to Druid as DoubleSum instead of LongSum and vice versa.
> {code}
> PREHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
> FROM druid_table GROUP BY floor_year(`__time`)
> PREHOOK: type: QUERY
> POSTHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
> FROM druid_table GROUP BY floor_year(`__time`)
> POSTHOOK: type: QUERY
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: druid_table
> properties:
> druid.query.json {"queryType":"timeseries","dataSource":"default.druid_table","descending":false,"granularity":"year","aggregations":[{"type":"longSum","name":"$f1","fieldName":"ctinyint"},{"type":"count","name":"$f2"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"],"context":{"skipEmptyBuckets":true}}
> druid.query.type timeseries
> Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Select Operator
> expressions: __time (type: timestamp with local time zone), (UDFToDouble($f1) / UDFToDouble($f2)) (type: double)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)