You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2017/12/05 22:24:00 UTC
[jira] [Commented] (HIVE-17716) Not pushing postaggregations into
Druid due to CAST on constant
[ https://issues.apache.org/jira/browse/HIVE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279284#comment-16279284 ]
slim bouguerra commented on HIVE-17716:
---------------------------------------
Even without cast I still see that the project is not pushed down as per the following example.
{code}
PREHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(cint) + 1
FROM druid_table GROUP BY floor_year(`__time`)
PREHOOK: type: QUERY
POSTHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(cint) + 1
FROM druid_table GROUP BY floor_year(`__time`)
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: druid_table
properties:
druid.query.json {"queryType":"timeseries","dataSource":"default.druid_table","descending":false,"granularity":"year","aggregations":[{"type":"longSum","name":"$f1","f
ieldName":"cint"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"],"context":{"skipEmptyBuckets":true}}
druid.query.type timeseries
Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: __time (type: timestamp with local time zone), ($f1 + 1) (type: bigint)
outputColumnNames: _col0, _col1
Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
> Not pushing postaggregations into Druid due to CAST on constant
> ---------------------------------------------------------------
>
> Key: HIVE-17716
> URL: https://issues.apache.org/jira/browse/HIVE-17716
> Project: Hive
> Issue Type: Improvement
> Components: Druid integration
> Reporter: Jesus Camacho Rodriguez
>
> After Calcite is upgraded to 1.14 and the rule to push post-aggregations to Druid is enabled, the following query fails to create a postaggregation:
> {code}
> EXPLAIN
> SELECT language, sum(added) + 100 AS a
> FROM druid_table_1
> GROUP BY language
> ORDER BY a DESC;
> {code}
> Problem seems to be that CAST is getting on the way for the rule to be applied. In particular, this is the final Calcite plan:
> {code}
> HiveSortLimit(sort0=[$1], dir0=[DESC-nulls-last])
> HiveProject(language=[$0], a=[+($1, CAST(100):DOUBLE)])
> DruidQuery(table=[[default.druid_table_1]], intervals=[[1900-01-01T00:00:00.000/3000-01-01T00:00:00.000]], groups=[{6}], aggs=[[sum($10)]])
> {code}
> There are two different parts to explore to seek a solution: 1) why {{CAST(100):DOUBLE)}} is not folded to {{100.0d}}, and 2) whether the rule to push post-aggregations to Druid could handle the CAST in some particular cases.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)