You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2015/02/15 00:32:11 UTC

[jira] [Created] (HIVE-9695) Redundant filter operator in reducer Vertex when CBO is disabled

Mostafa Mokhtar created HIVE-9695:
-------------------------------------

             Summary: Redundant filter operator in reducer Vertex when CBO is disabled
                 Key: HIVE-9695
                 URL: https://issues.apache.org/jira/browse/HIVE-9695
             Project: Hive
          Issue Type: Bug
          Components: Physical Optimizer
    Affects Versions: 0.14.0
            Reporter: Mostafa Mokhtar
            Assignee: Gunther Hagleitner
             Fix For: 1.2.0


There is a redundant filter operator in reducer Vertex when CBO is disabled.

Query 
{code}
select 
        ss_item_sk, ss_ticket_number, ss_store_sk
    from
        store_sales a, store_returns b, store
    where
        a.ss_item_sk = b.sr_item_sk
            and a.ss_ticket_number = b.sr_ticket_number 
            and ss_sold_date_sk between 2450816 and 2451500
			and sr_returned_date_sk between 2450816 and 2451500
			and s_store_sk = ss_store_sk;
{code}
Plan snippet 
{code}
  Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6)) (type: boolean)
{code}

Full plan with CBO disabled
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 (SIMPLE_EDGE)
      DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: b
                  filterExpr: ((sr_item_sk is not null and sr_ticket_number is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: boolean)
                  Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                    Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
                      sort order: ++
                      Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int)
                      Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: sr_returned_date_sk (type: int)
            Execution mode: vectorized
        Map 3
            Map Operator Tree:
                TableScan
                  alias: store
                  filterExpr: s_store_sk is not null (type: boolean)
                  Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: s_store_sk is not null (type: boolean)
                    Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: s_store_sk (type: int)
                      sort order: +
                      Map-reduce partition columns: s_store_sk (type: int)
                      Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Map 4
            Map Operator Tree:
                TableScan
                  alias: a
                  filterExpr: (((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 AND 2451500) (type: boolean)
                  Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) (type: boolean)
                    Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
                      sort order: ++
                      Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int)
                      Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: ss_store_sk (type: int), ss_sold_date_sk (type: int)
            Execution mode: vectorized
        Reducer 2
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {KEY.reducesinkkey0} {VALUE._col5} {KEY.reducesinkkey1} {VALUE._col20}
                  1 {KEY.reducesinkkey0} {KEY.reducesinkkey1} {VALUE._col17}
                outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45
                Statistics: Num rows: 57439343 Data size: 1148786860 Basic stats: COMPLETE Column stats: COMPLETE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  condition expressions:
                    0 {_col1} {_col6} {_col8} {_col22} {_col27} {_col34} {_col45}
                    1 {s_store_sk}
                  keys:
                    0 _col6 (type: int)
                    1 s_store_sk (type: int)
                  outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45, _col49
                  input vertices:
                    1 Map 3
                  Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6)) (type: boolean)
                    Statistics: Num rows: 1794979 Data size: 57439328 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 1794979 Data size: 21539748 Basic stats: COMPLETE Column stats: COMPLETE
                      File Output Operator
                        compressed: false
                        Statistics: Num rows: 1794979 Data size: 21539748 Basic stats: COMPLETE Column stats: COMPLETE
                        table:
                            input format: org.apache.hadoop.mapred.TextInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}

Full plan with CBO enabled
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 4 <- Map 1 (BROADCAST_EDGE)
        Reducer 3 <- Map 2 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
      DagName: mmokhtar_20150214182525_63a9838f-db9f-40e9-8ae1-77c77143dccf:12
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: store
                  filterExpr: s_store_sk is not null (type: boolean)
                  Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: s_store_sk is not null (type: boolean)
                    Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: s_store_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Map 2
            Map Operator Tree:
                TableScan
                  alias: b
                  filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                  Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                    Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int), _col1 (type: int)
                        sort order: ++
                        Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
                        Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Map 4
            Map Operator Tree:
                TableScan
                  alias: a
                  filterExpr: ((ss_store_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) (type: boolean)
                  Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((ss_store_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) (type: boolean)
                    Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: ss_item_sk (type: int), ss_store_sk (type: int), ss_ticket_number (type: int)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        condition expressions:
                          0 {_col0} {_col1} {_col2}
                          1
                        keys:
                          0 _col1 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0, _col1, _col2
                        input vertices:
                          1 Map 1
                        Statistics: Num rows: 8405840896 Data size: 100870090752 Basic stats: COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col0 (type: int), _col2 (type: int)
                          sort order: ++
                          Map-reduce partition columns: _col0 (type: int), _col2 (type: int)
                          Statistics: Num rows: 8405840896 Data size: 100870090752 Basic stats: COMPLETE Column stats: COMPLETE
                          value expressions: _col1 (type: int)
            Execution mode: vectorized
        Reducer 3
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {KEY.reducesinkkey0} {VALUE._col0} {KEY.reducesinkkey1}
                  1
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: _col0 (type: int), _col2 (type: int), _col1 (type: int)
                  outputColumnNames: _col0, _col1, _col2
                  Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)