You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2017/12/22 14:52:00 UTC

[jira] [Assigned] (HIVE-18226) handle UDF to double/int over aggregate

     [ https://issues.apache.org/jira/browse/HIVE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

slim bouguerra reassigned HIVE-18226:
-------------------------------------

    Assignee: slim bouguerra

> handle UDF to double/int over aggregate
> ---------------------------------------
>
>                 Key: HIVE-18226
>                 URL: https://issues.apache.org/jira/browse/HIVE-18226
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Druid integration
>            Reporter: slim bouguerra
>            Assignee: slim bouguerra
>
> In cases like the following query Hive planner adds extra UDFtoDouble over integer columns.
> This kind of udf can be pushed to Druid as DoubleSum instead of LongSum and vice versa.
> {code}
> PREHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
> FROM druid_table GROUP BY floor_year(`__time`)
> PREHOOK: type: QUERY
> POSTHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
> FROM druid_table GROUP BY floor_year(`__time`)
> POSTHOOK: type: QUERY
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: druid_table
>             properties:
>               druid.query.json {"queryType":"timeseries","dataSource":"default.druid_table","descending":false,"granularity":"year","aggregations":[{"type":"longSum","name":"$f1","fieldName":"ctinyint"},{"type":"count","name":"$f2"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"],"context":{"skipEmptyBuckets":true}}
>               druid.query.type timeseries
>             Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>             Select Operator
>               expressions: __time (type: timestamp with local time zone), (UDFToDouble($f1) / UDFToDouble($f2)) (type: double)
>               outputColumnNames: _col0, _col1
>               Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>               File Output Operator
>                 compressed: false
>                 Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                 table:
>                     input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                     output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                     serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)