You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2017/12/05 22:24:00 UTC
[jira] [Commented] (HIVE-17716) Not pushing postaggregations into Druid due to CAST on constant

    [ https://issues.apache.org/jira/browse/HIVE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279284#comment-16279284 ] 

slim bouguerra commented on HIVE-17716:
---------------------------------------

Even without cast I still see that the project is not pushed down as per the following example.

{code}
PREHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(cint) + 1
FROM druid_table GROUP BY floor_year(`__time`)
PREHOOK: type: QUERY
POSTHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(cint) + 1
FROM druid_table GROUP BY floor_year(`__time`)
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: druid_table
            properties:
              druid.query.json {"queryType":"timeseries","dataSource":"default.druid_table","descending":false,"granularity":"year","aggregations":[{"type":"longSum","name":"$f1","f
ieldName":"cint"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"],"context":{"skipEmptyBuckets":true}}
              druid.query.type timeseries
            Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: __time (type: timestamp with local time zone), ($f1 + 1) (type: bigint)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

{code}

> Not pushing postaggregations into Druid due to CAST on constant
> ---------------------------------------------------------------
>
>                 Key: HIVE-17716
>                 URL: https://issues.apache.org/jira/browse/HIVE-17716
>             Project: Hive
>          Issue Type: Improvement
>          Components: Druid integration
>            Reporter: Jesus Camacho Rodriguez
>
> After Calcite is upgraded to 1.14 and the rule to push post-aggregations to Druid is enabled, the following query fails to create a postaggregation:
> {code}
> EXPLAIN
> SELECT language, sum(added) + 100 AS a
> FROM druid_table_1
> GROUP BY language
> ORDER BY a DESC;
> {code}
> Problem seems to be that CAST is getting on the way for the rule to be applied. In particular, this is the final Calcite plan:
> {code}
>  HiveSortLimit(sort0=[$1], dir0=[DESC-nulls-last])
>   HiveProject(language=[$0], a=[+($1, CAST(100):DOUBLE)])
>     DruidQuery(table=[[default.druid_table_1]], intervals=[[1900-01-01T00:00:00.000/3000-01-01T00:00:00.000]], groups=[{6}], aggs=[[sum($10)]])
> {code}
> There are two different parts to explore to seek a solution: 1) why {{CAST(100):DOUBLE)}} is not folded to {{100.0d}}, and 2) whether the rule to push post-aggregations to Druid could handle the CAST in some particular cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)