You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2017/06/08 02:01:18 UTC
[jira] [Updated] (HIVE-16852) PTF: RANK() re-evaluates order predicates on the reducer

     [ https://issues.apache.org/jira/browse/HIVE-16852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated HIVE-16852:
---------------------------
    Labels: performance tpcds  (was: )

> PTF: RANK() re-evaluates order predicates on the reducer
> --------------------------------------------------------
>
>                 Key: HIVE-16852
>                 URL: https://issues.apache.org/jira/browse/HIVE-16852
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 2.1.1, 3.0.0
>            Reporter: Gopal V
>              Labels: performance, tpcds
>
> {code}
> explain select ss_item_sk, rank() over(order by cast(ss_list_price as decimal(38,10))) as r , ss_list_price from store_sales;
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: root_20170608015140_7b0debb9-b14b-4150-b004-9743c6127392:3
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>       DagName:
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: store_sales
>                   Statistics: Num rows: 28800426268 Data size: 450435120648 Basic stats: COMPLETE Column stats: COMPLETE
>                   Reduce Output Operator
>                     key expressions: 0 (type: int), CAST( ss_list_price AS decimal(38,10)) (type: decimal(38,10))
>                     sort order: ++
>                     Map-reduce partition columns: 0 (type: int)
>                     Statistics: Num rows: 28800426268 Data size: 450435120648 Basic stats: COMPLETE Column stats: COMPLETE
>                     value expressions: ss_item_sk (type: bigint), ss_list_price (type: double)
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>         Reducer 2 
>             Execution mode: llap
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: VALUE._col1 (type: bigint), VALUE._col11 (type: double)
>                 outputColumnNames: _col1, _col11
>                 Statistics: Num rows: 28800426268 Data size: 8399352770616 Basic stats: COMPLETE Column stats: COMPLETE
>                 PTF Operator
>                   Function definitions:
>                       Input definition
>                         input alias: ptf_0
>                         output shape: _col1: bigint, _col11: double
>                         type: WINDOWING
>                       Windowing table definition
>                         input alias: ptf_1
>                         name: windowingtablefunction
>                         order by: CAST( _col11 AS decimal(38,10)) ASC NULLS FIRST
>                         partition by: 0
>                         raw input shape:
>                         window functions:
>                             window function definition
>                               alias: rank_window_0
>                               arguments: CAST( _col11 AS decimal(38,10))
>                               name: rank
>                               window function: GenericUDAFRankEvaluator
>                               window frame: PRECEDING(MAX)~FOLLOWING(MAX)
>                               isPivotResult: true
>                   Statistics: Num rows: 28800426268 Data size: 8399352770616 Basic stats: COMPLETE Column stats: COMPLETE
>                   Select Operator
>                     expressions: _col1 (type: bigint), rank_window_0 (type: int), _col11 (type: double)
>                     outputColumnNames: _col0, _col1, _col2
>                     Statistics: Num rows: 28800426268 Data size: 565636825720 Basic stats: COMPLETE Column stats: COMPLETE
>                     File Output Operator
>                       compressed: false
>                       Statistics: Num rows: 28800426268 Data size: 565636825720 Basic stats: COMPLETE Column stats: COMPLETE
>                       table:
>                           input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                           output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                           serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> This forces the Decimal cast to be evaluated ~2x - once to produce the KEY expression and once within the window function.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)