You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2017/06/08 02:01:18 UTC
[jira] [Updated] (HIVE-16852) PTF: RANK() re-evaluates order
predicates on the reducer
[ https://issues.apache.org/jira/browse/HIVE-16852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-16852:
---------------------------
Labels: performance tpcds (was: )
> PTF: RANK() re-evaluates order predicates on the reducer
> --------------------------------------------------------
>
> Key: HIVE-16852
> URL: https://issues.apache.org/jira/browse/HIVE-16852
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer
> Affects Versions: 2.1.1, 3.0.0
> Reporter: Gopal V
> Labels: performance, tpcds
>
> {code}
> explain select ss_item_sk, rank() over(order by cast(ss_list_price as decimal(38,10))) as r , ss_list_price from store_sales;
> STAGE PLANS:
> Stage: Stage-1
> Tez
> DagId: root_20170608015140_7b0debb9-b14b-4150-b004-9743c6127392:3
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> DagName:
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: store_sales
> Statistics: Num rows: 28800426268 Data size: 450435120648 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: 0 (type: int), CAST( ss_list_price AS decimal(38,10)) (type: decimal(38,10))
> sort order: ++
> Map-reduce partition columns: 0 (type: int)
> Statistics: Num rows: 28800426268 Data size: 450435120648 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: ss_item_sk (type: bigint), ss_list_price (type: double)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Reducer 2
> Execution mode: llap
> Reduce Operator Tree:
> Select Operator
> expressions: VALUE._col1 (type: bigint), VALUE._col11 (type: double)
> outputColumnNames: _col1, _col11
> Statistics: Num rows: 28800426268 Data size: 8399352770616 Basic stats: COMPLETE Column stats: COMPLETE
> PTF Operator
> Function definitions:
> Input definition
> input alias: ptf_0
> output shape: _col1: bigint, _col11: double
> type: WINDOWING
> Windowing table definition
> input alias: ptf_1
> name: windowingtablefunction
> order by: CAST( _col11 AS decimal(38,10)) ASC NULLS FIRST
> partition by: 0
> raw input shape:
> window functions:
> window function definition
> alias: rank_window_0
> arguments: CAST( _col11 AS decimal(38,10))
> name: rank
> window function: GenericUDAFRankEvaluator
> window frame: PRECEDING(MAX)~FOLLOWING(MAX)
> isPivotResult: true
> Statistics: Num rows: 28800426268 Data size: 8399352770616 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col1 (type: bigint), rank_window_0 (type: int), _col11 (type: double)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 28800426268 Data size: 565636825720 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 28800426268 Data size: 565636825720 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> This forces the Decimal cast to be evaluated ~2x - once to produce the KEY expression and once within the window function.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)