You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2014/12/10 23:14:13 UTC

[jira] [Created] (HIVE-9068) Hive : With CBO disabled Vectorization in Map join disabled causing 100% increase in elapsed time and CPU (possibly due to redundant filter operator)

Mostafa Mokhtar created HIVE-9068:
-------------------------------------

             Summary: Hive : With CBO disabled Vectorization in Map join disabled causing 100% increase in elapsed time and CPU (possibly due to redundant filter operator)
                 Key: HIVE-9068
                 URL: https://issues.apache.org/jira/browse/HIVE-9068
             Project: Hive
          Issue Type: Bug
          Components: Vectorization
    Affects Versions: 0.14.0
            Reporter: Mostafa Mokhtar
            Assignee: Matt McCline
             Fix For: 0.14.1


With CBO off there is a redundant filter operator 
{code}
 Filter Operator
                          predicate: ((null is null and (_col22 = _col51)) and (_col1 = _col26)) (type: boolean)
{code}

Possibly this is why Vectorization is getting disabled with CBO off, this operator doesn't exist with CBO on.

Query 
{code}
select 
    count(*)
from
    (SELECT 
        'store' as channel,
            'ss_addr_sk' col_name,
            d_year,
            d_qoy,
            i_category,
            ss_ext_sales_price ext_sales_price
    FROM
        store_sales, item, date_dim
    WHERE
        ss_addr_sk IS NULL
            AND store_sales.ss_sold_date_sk = date_dim.d_date_sk
            AND store_sales.ss_item_sk = item.i_item_sk) a;
{code}

Explain with CBO OFF
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 1 <- Map 3 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE)
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
      DagName: mmokhtar_20141210171212_02c36f60-ceea-4e18-a266-5baecfd023f2:6
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  filterExpr: (ss_item_sk is not null and ss_addr_sk is null) (type: boolean)
                  Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (ss_item_sk is not null and ss_addr_sk is null) (type: boolean)
                    Statistics: Num rows: 1946839900 Data size: 23178336456 Basic stats: COMPLETE Column stats: COMPLETE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {ss_item_sk} {ss_sold_date_sk}
                        1 {i_item_sk}
                      keys:
                        0 ss_item_sk (type: int)
                        1 i_item_sk (type: int)
                      outputColumnNames: _col1, _col22, _col26
                      input vertices:
                        1 Map 4
                      Statistics: Num rows: 1946839936 Data size: 23362079232 Basic stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        condition expressions:
                          0 {_col1} {_col22} {_col26}
                          1 {d_date_sk}
                        keys:
                          0 _col22 (type: int)
                          1 d_date_sk (type: int)
                        outputColumnNames: _col1, _col22, _col26, _col51
                        input vertices:
                          1 Map 3
                        Statistics: Num rows: 2176800197 Data size: 34828803152 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ((null is null and (_col22 = _col51)) and (_col1 = _col26)) (type: boolean)
                          Statistics: Num rows: 272100024 Data size: 4353600384 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            Statistics: Num rows: 272100024 Data size: 4353600384 Basic stats: COMPLETE Column stats: COMPLETE
                            Group By Operator
                              aggregations: count()
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                              Reduce Output Operator
                                sort order:
                                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                value expressions: _col0 (type: bigint)
        Map 3
            Map Operator Tree:
                TableScan
                  alias: date_dim
                  filterExpr: d_date_sk is not null (type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: d_date_sk is not null (type: boolean)
                    Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: d_date_sk (type: int)
                      sort order: +
                      Map-reduce partition columns: d_date_sk (type: int)
                      Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: d_date_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                      Group By Operator
                        keys: _col0 (type: int)
                        mode: hash
                        outputColumnNames: _col0
                        Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
                        Dynamic Partitioning Event Operator
                          Target Input: store_sales
                          Partition key expr: ss_sold_date_sk
                          Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
                          Target column: ss_sold_date_sk
                          Target Vertex: Map 1
            Execution mode: vectorized
        Map 4
            Map Operator Tree:
                TableScan
                  alias: item
                  filterExpr: i_item_sk is not null (type: boolean)
                  Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: i_item_sk is not null (type: boolean)
                    Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: i_item_sk (type: int)
                      sort order: +
                      Map-reduce partition columns: i_item_sk (type: int)
                      Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Reducer 2
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: _col0 (type: bigint)
                  outputColumnNames: _col0
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}


Explain with CBO on 
{code}
STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 1 <- Map 3 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE)
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
      DagName: mmokhtar_20141210171212_495d0eb9-d176-43d3-8101-84821a0c0fdf:5
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  filterExpr: (ss_addr_sk is null and ss_item_sk is not null) (type: boolean)
                  Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (ss_addr_sk is null and ss_item_sk is not null) (type: boolean)
                    Statistics: Num rows: 1946839900 Data size: 23178336456 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: ss_item_sk (type: int), ss_sold_date_sk (type: int)
                      outputColumnNames: _col0, _col2
                      Statistics: Num rows: 1946839900 Data size: 15574719200 Basic stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        condition expressions:
                          0
                          1 {_col2}
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col3
                        input vertices:
                          0 Map 4
                        Statistics: Num rows: 1946839936 Data size: 7787359744 Basic stats: COMPLETE Column stats: COMPLETE
                        Select Operator
                          expressions: _col3 (type: int)
                          outputColumnNames: _col3
                          Statistics: Num rows: 1946839936 Data size: 7787359744 Basic stats: COMPLETE Column stats: COMPLETE
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            condition expressions:
                              0
                              1
                            keys:
                              0 _col0 (type: int)
                              1 _col3 (type: int)
                            input vertices:
                              0 Map 3
                            Statistics: Num rows: 3232152511019 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                            Select Operator
                              Statistics: Num rows: 3232152511019 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                              Group By Operator
                                aggregations: count()
                                mode: hash
                                outputColumnNames: _col0
                                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                Reduce Output Operator
                                  sort order:
                                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                  value expressions: _col0 (type: bigint)
            Execution mode: vectorized
        Map 3
            Map Operator Tree:
                TableScan
                  alias: date_dim
                  filterExpr: d_date_sk is not null (type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: d_date_sk is not null (type: boolean)
                    Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: d_date_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: int)
                        outputColumnNames: _col0
                        Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                        Group By Operator
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
                          Dynamic Partitioning Event Operator
                            Target Input: store_sales
                            Partition key expr: ss_sold_date_sk
                            Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
                            Target column: ss_sold_date_sk
                            Target Vertex: Map 1
            Execution mode: vectorized
        Map 4
            Map Operator Tree:
                TableScan
                  alias: item
                  filterExpr: i_item_sk is not null (type: boolean)
                  Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: i_item_sk is not null (type: boolean)
                    Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: i_item_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Reducer 2
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: _col0 (type: bigint)
                  outputColumnNames: _col0
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 3.874 seconds, Fetched: 144 row(s)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)