You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2014/12/10 23:14:13 UTC
[jira] [Created] (HIVE-9068) Hive : With CBO disabled Vectorization
in Map join disabled causing 100% increase in elapsed time and CPU
(possibly due to redundant filter operator)
Mostafa Mokhtar created HIVE-9068:
-------------------------------------
Summary: Hive : With CBO disabled Vectorization in Map join disabled causing 100% increase in elapsed time and CPU (possibly due to redundant filter operator)
Key: HIVE-9068
URL: https://issues.apache.org/jira/browse/HIVE-9068
Project: Hive
Issue Type: Bug
Components: Vectorization
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
Fix For: 0.14.1
With CBO off there is a redundant filter operator
{code}
Filter Operator
predicate: ((null is null and (_col22 = _col51)) and (_col1 = _col26)) (type: boolean)
{code}
Possibly this is why Vectorization is getting disabled with CBO off, this operator doesn't exist with CBO on.
Query
{code}
select
count(*)
from
(SELECT
'store' as channel,
'ss_addr_sk' col_name,
d_year,
d_qoy,
i_category,
ss_ext_sales_price ext_sales_price
FROM
store_sales, item, date_dim
WHERE
ss_addr_sk IS NULL
AND store_sales.ss_sold_date_sk = date_dim.d_date_sk
AND store_sales.ss_item_sk = item.i_item_sk) a;
{code}
Explain with CBO OFF
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 1 <- Map 3 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE)
DagName: mmokhtar_20141210171212_02c36f60-ceea-4e18-a266-5baecfd023f2:6
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: store_sales
filterExpr: (ss_item_sk is not null and ss_addr_sk is null) (type: boolean)
Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (ss_item_sk is not null and ss_addr_sk is null) (type: boolean)
Statistics: Num rows: 1946839900 Data size: 23178336456 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {ss_item_sk} {ss_sold_date_sk}
1 {i_item_sk}
keys:
0 ss_item_sk (type: int)
1 i_item_sk (type: int)
outputColumnNames: _col1, _col22, _col26
input vertices:
1 Map 4
Statistics: Num rows: 1946839936 Data size: 23362079232 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col1} {_col22} {_col26}
1 {d_date_sk}
keys:
0 _col22 (type: int)
1 d_date_sk (type: int)
outputColumnNames: _col1, _col22, _col26, _col51
input vertices:
1 Map 3
Statistics: Num rows: 2176800197 Data size: 34828803152 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((null is null and (_col22 = _col51)) and (_col1 = _col26)) (type: boolean)
Statistics: Num rows: 272100024 Data size: 4353600384 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
Statistics: Num rows: 272100024 Data size: 4353600384 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: bigint)
Map 3
Map Operator Tree:
TableScan
alias: date_dim
filterExpr: d_date_sk is not null (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: d_date_sk is not null (type: boolean)
Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: d_date_sk (type: int)
sort order: +
Map-reduce partition columns: d_date_sk (type: int)
Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: d_date_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
Dynamic Partitioning Event Operator
Target Input: store_sales
Partition key expr: ss_sold_date_sk
Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
Target column: ss_sold_date_sk
Target Vertex: Map 1
Execution mode: vectorized
Map 4
Map Operator Tree:
TableScan
alias: item
filterExpr: i_item_sk is not null (type: boolean)
Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: i_item_sk is not null (type: boolean)
Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: i_item_sk (type: int)
sort order: +
Map-reduce partition columns: i_item_sk (type: int)
Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Reducer 2
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
Explain with CBO on
{code}
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 1 <- Map 3 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE)
DagName: mmokhtar_20141210171212_495d0eb9-d176-43d3-8101-84821a0c0fdf:5
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: store_sales
filterExpr: (ss_addr_sk is null and ss_item_sk is not null) (type: boolean)
Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (ss_addr_sk is null and ss_item_sk is not null) (type: boolean)
Statistics: Num rows: 1946839900 Data size: 23178336456 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: ss_item_sk (type: int), ss_sold_date_sk (type: int)
outputColumnNames: _col0, _col2
Statistics: Num rows: 1946839900 Data size: 15574719200 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0
1 {_col2}
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col3
input vertices:
0 Map 4
Statistics: Num rows: 1946839936 Data size: 7787359744 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col3 (type: int)
outputColumnNames: _col3
Statistics: Num rows: 1946839936 Data size: 7787359744 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0
1
keys:
0 _col0 (type: int)
1 _col3 (type: int)
input vertices:
0 Map 3
Statistics: Num rows: 3232152511019 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
Statistics: Num rows: 3232152511019 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: bigint)
Execution mode: vectorized
Map 3
Map Operator Tree:
TableScan
alias: date_dim
filterExpr: d_date_sk is not null (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: d_date_sk is not null (type: boolean)
Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: d_date_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
Dynamic Partitioning Event Operator
Target Input: store_sales
Partition key expr: ss_sold_date_sk
Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
Target column: ss_sold_date_sk
Target Vertex: Map 1
Execution mode: vectorized
Map 4
Map Operator Tree:
TableScan
alias: item
filterExpr: i_item_sk is not null (type: boolean)
Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: i_item_sk is not null (type: boolean)
Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: i_item_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Reducer 2
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 3.874 seconds, Fetched: 144 row(s)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)