You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Matt McCline (JIRA)" <ji...@apache.org> on 2015/04/25 02:23:38 UTC

[jira] [Commented] (HIVE-10484) Vectorization : RuntimeException "Big Table Retained Mapping duplicate column"

    [ https://issues.apache.org/jira/browse/HIVE-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512088#comment-14512088 ] 

Matt McCline commented on HIVE-10484:
-------------------------------------


I was able to vectorize the query.  I'm wondering what environment variables are different that cause the issue you reported.

{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 1 <- Map 2 (BROADCAST_EDGE)
        Map 3 <- Map 1 (BROADCAST_EDGE)
        Reducer 4 <- Map 3 (SIMPLE_EDGE)
        Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                  Filter Operator
                    predicate: (ss_store_sk is not null and ss_sold_date_sk is not null) (type: boolean)
                    Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      keys:
                        0 ss_store_sk (type: int)
                        1 s_store_sk (type: int)
                      outputColumnNames: _col0, _col21, _col22, _col26, _col50
                      input vertices:
                        1 Map 2
                      Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                      HybridGraceHashJoin: true
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                        value expressions: _col21 (type: decimal(7,2)), _col22 (type: int), _col26 (type: int), _col50 (type: string)
            Execution mode: vectorized
        Map 2 
            Map Operator Tree:
                TableScan
                  alias: store
                  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                  Filter Operator
                    predicate: s_store_sk is not null (type: boolean)
                    Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                    Reduce Output Operator
                      key expressions: s_store_sk (type: int)
                      sort order: +
                      Map-reduce partition columns: s_store_sk (type: int)
                      Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                      value expressions: s_state (type: string)
            Execution mode: vectorized
        Map 3 
            Map Operator Tree:
                TableScan
                  alias: date_dim
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: (d_date_sk is not null and d_month_seq BETWEEN 1193 AND 1204) (type: boolean)
                    Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      keys:
                        0 _col0 (type: int)
                        1 d_date_sk (type: int)
                      outputColumnNames: _col0, _col21, _col22, _col26, _col50, _col58, _col61
                      input vertices:
                        0 Map 1
                      Statistics: Num rows: 20088 Data size: 22478696 Basic stats: COMPLETE Column stats: NONE
                      HybridGraceHashJoin: true
                      Filter Operator
                        predicate: ((_col61 BETWEEN 1193 AND 1204 and (_col58 = _col0)) and (_col26 = _col22)) (type: boolean)
                        Statistics: Num rows: 2511 Data size: 2809837 Basic stats: COMPLETE Column stats: NONE
                        Select Operator
                          expressions: _col50 (type: string), _col21 (type: decimal(7,2))
                          outputColumnNames: _col50, _col21
                          Statistics: Num rows: 2511 Data size: 2809837 Basic stats: COMPLETE Column stats: NONE
                          Group By Operator
                            aggregations: sum(_col21)
                            keys: _col50 (type: string)
                            mode: hash
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 2511 Data size: 2809837 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              key expressions: _col0 (type: string)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: string)
                              Statistics: Num rows: 2511 Data size: 2809837 Basic stats: COMPLETE Column stats: NONE
                              value expressions: _col1 (type: decimal(17,2))
            Execution mode: vectorized
        Reducer 4 
            Reduce Operator Tree:
              Group By Operator
                aggregations: sum(VALUE._col0)
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1255 Data size: 1404358 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: string), _col1 (type: decimal(17,2))
                  sort order: +-
                  Map-reduce partition columns: _col0 (type: string)
                  Statistics: Num rows: 1255 Data size: 1404358 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col1 (type: decimal(17,2))
            Execution mode: vectorized
        Reducer 5 
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: decimal(17,2))
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1255 Data size: 1404358 Basic stats: COMPLETE Column stats: NONE
                PTF Operator
                  Function definitions:
                      Input definition
                        input alias: ptf_0
                        output shape: _col0: string, _col1: decimal(17,2)
                        type: WINDOWING
                      Windowing table definition
                        input alias: ptf_1
                        name: windowingtablefunction
                        order by: _col1(DESC)
                        partition by: _col0
                        raw input shape:
                        window functions:
                            window function definition
                              alias: rank_window_0
                              arguments: _col1
                              name: rank
                              window function: GenericUDAFRankEvaluator
                              window frame: PRECEDING(MAX)~FOLLOWING(MAX)
                              isPivotResult: true
                  Statistics: Num rows: 1255 Data size: 1404358 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: (rank_window_0 <= 5) (type: boolean)
                    Statistics: Num rows: 418 Data size: 467746 Basic stats: COMPLETE Column stats: NONE
                    Select Operator
                      expressions: _col0 (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 418 Data size: 467746 Basic stats: COMPLETE Column stats: NONE
                      File Output Operator
                        compressed: false
                        Statistics: Num rows: 418 Data size: 467746 Basic stats: COMPLETE Column stats: NONE
                        table:
                            input format: org.apache.hadoop.mapred.TextInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{noformat}

> Vectorization : RuntimeException "Big Table Retained Mapping duplicate column"
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-10484
>                 URL: https://issues.apache.org/jira/browse/HIVE-10484
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez, Vectorization
>    Affects Versions: 1.2.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Matt McCline
>             Fix For: 1.2.0
>
>
> With vectorization and tez enabled TPC-DS Q70 fails with 
> {code}
> Caused by: java.lang.RuntimeException: Big Table Retained Mapping duplicate column 6 in ordered column map {6=(value column: 6, type name: int), 21=(value column: 21, type name: float), 22=(value column: 22, type name: int)} when adding value column 6, type int
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorColumnOrderedMap.add(VectorColumnOrderedMap.java:97)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorColumnOutputMapping.add(VectorColumnOutputMapping.java:40)
> 	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.determineCommonInfo(VectorMapJoinCommonOperator.java:320)
> 	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.<init>(VectorMapJoinCommonOperator.java:254)
> 	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.<init>(VectorMapJoinGenerateResultOperator.java:89)
> 	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.<init>(VectorMapJoinInnerGenerateResultOperator.java:97)
> 	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.<init>(VectorMapJoinInnerLongOperator.java:79)
> 	... 49 more
> {code}
> Query 
> {code}
>  select s_state
>                from  (select s_state as s_state, sum(ss_net_profit),
>                              rank() over ( partition by s_state order by sum(ss_net_profit) desc) as ranking
>                       from   store_sales, store, date_dim
>                       where  d_month_seq between 1193 and 1193+11
>                             and date_dim.d_date_sk = store_sales.ss_sold_date_sk
>                             and store.s_store_sk  = store_sales.ss_store_sk
>                       group by s_state
>                      ) tmp1
>                where ranking <= 5
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)