You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2015/02/15 00:32:11 UTC
[jira] [Created] (HIVE-9695) Redundant filter operator in reducer
Vertex when CBO is disabled
Mostafa Mokhtar created HIVE-9695:
-------------------------------------
Summary: Redundant filter operator in reducer Vertex when CBO is disabled
Key: HIVE-9695
URL: https://issues.apache.org/jira/browse/HIVE-9695
Project: Hive
Issue Type: Bug
Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Gunther Hagleitner
Fix For: 1.2.0
There is a redundant filter operator in reducer Vertex when CBO is disabled.
Query
{code}
select
ss_item_sk, ss_ticket_number, ss_store_sk
from
store_sales a, store_returns b, store
where
a.ss_item_sk = b.sr_item_sk
and a.ss_ticket_number = b.sr_ticket_number
and ss_sold_date_sk between 2450816 and 2451500
and sr_returned_date_sk between 2450816 and 2451500
and s_store_sk = ss_store_sk;
{code}
Plan snippet
{code}
Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6)) (type: boolean)
{code}
Full plan with CBO disabled
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 (SIMPLE_EDGE)
DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: b
filterExpr: ((sr_item_sk is not null and sr_ticket_number is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: boolean)
Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
sort order: ++
Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int)
Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: sr_returned_date_sk (type: int)
Execution mode: vectorized
Map 3
Map Operator Tree:
TableScan
alias: store
filterExpr: s_store_sk is not null (type: boolean)
Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: s_store_sk is not null (type: boolean)
Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: s_store_sk (type: int)
sort order: +
Map-reduce partition columns: s_store_sk (type: int)
Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 4
Map Operator Tree:
TableScan
alias: a
filterExpr: (((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 AND 2451500) (type: boolean)
Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) (type: boolean)
Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
sort order: ++
Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int)
Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: ss_store_sk (type: int), ss_sold_date_sk (type: int)
Execution mode: vectorized
Reducer 2
Reduce Operator Tree:
Merge Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {KEY.reducesinkkey0} {VALUE._col5} {KEY.reducesinkkey1} {VALUE._col20}
1 {KEY.reducesinkkey0} {KEY.reducesinkkey1} {VALUE._col17}
outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45
Statistics: Num rows: 57439343 Data size: 1148786860 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col1} {_col6} {_col8} {_col22} {_col27} {_col34} {_col45}
1 {s_store_sk}
keys:
0 _col6 (type: int)
1 s_store_sk (type: int)
outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45, _col49
input vertices:
1 Map 3
Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6)) (type: boolean)
Statistics: Num rows: 1794979 Data size: 57439328 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1794979 Data size: 21539748 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1794979 Data size: 21539748 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
Full plan with CBO enabled
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 4 <- Map 1 (BROADCAST_EDGE)
Reducer 3 <- Map 2 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
DagName: mmokhtar_20150214182525_63a9838f-db9f-40e9-8ae1-77c77143dccf:12
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: store
filterExpr: s_store_sk is not null (type: boolean)
Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: s_store_sk is not null (type: boolean)
Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: s_store_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 2
Map Operator Tree:
TableScan
alias: b
filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: int)
sort order: ++
Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 4
Map Operator Tree:
TableScan
alias: a
filterExpr: ((ss_store_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) (type: boolean)
Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((ss_store_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) (type: boolean)
Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: ss_item_sk (type: int), ss_store_sk (type: int), ss_ticket_number (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col0} {_col1} {_col2}
1
keys:
0 _col1 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col1, _col2
input vertices:
1 Map 1
Statistics: Num rows: 8405840896 Data size: 100870090752 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int), _col2 (type: int)
sort order: ++
Map-reduce partition columns: _col0 (type: int), _col2 (type: int)
Statistics: Num rows: 8405840896 Data size: 100870090752 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: int)
Execution mode: vectorized
Reducer 3
Reduce Operator Tree:
Merge Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {KEY.reducesinkkey0} {VALUE._col0} {KEY.reducesinkkey1}
1
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: int), _col2 (type: int), _col1 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)