You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2017/10/23 05:25:00 UTC
[jira] [Comment Edited] (HIVE-17193) HoS: don't combine map works
that are targets of different DPPs
[ https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214644#comment-16214644 ]
liyunzhang_intel edited comment on HIVE-17193 at 10/23/17 5:24 AM:
-------------------------------------------------------------------
I can reproduce after disabling cbo
{code}
set hive.explain.user=false;
set hive.spark.dynamic.partition.pruning=true;
set hive.tez.dynamic.partition.pruning=true;
set hive.auto.convert.join=false;
set hive.cbo.enable=false;
explain
select * from
(select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
join
(select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b
on a.key=b.key;
{code}
the explain
{code}
STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-1 depends on stages: Stage-2
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Spark
DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:2
Vertices:
Map 8
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
Target column: ds (string)
partition key expr: ds
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
target work: Map 1
Stage: Stage-1
Spark
Edges:
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1)
Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6 (PARTITION-LEVEL SORT, 1)
Reducer 6 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 7 (PARTITION-LEVEL SORT, 1)
DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: srcpart
Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: ds (type: string)
sort order: +
Map-reduce partition columns: ds (type: string)
Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
value expressions: key (type: string)
Map 4
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: key (type: string)
sort order: +
Map-reduce partition columns: key (type: string)
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Map 7
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: value is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: value (type: string)
sort order: +
Map-reduce partition columns: value (type: string)
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Reducer 2
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 ds (type: string)
1 key (type: string)
outputColumnNames: _col0, _col2
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col2 (type: string), _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string)
Reducer 3
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: string)
1 _col1 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 6
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 ds (type: string)
1 value (type: string)
outputColumnNames: _col0, _col2
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col2 (type: string), _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string)
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
There is only 1 Map about srcpart. The reason why the maps about srcpart can not be merged when enabling cbo is because the RS in Maps are considered different while they are considered same when disabling cbo(see attached [picture|https://issues.apache.org/jira/secure/attachment/12893484/17193_compare_RS_in_Map_5_1.PNG])
was (Author: kellyzly):
I can reproduce after disabling cbo
{code}
set hive.explain.user=false;
set hive.spark.dynamic.partition.pruning=true;
set hive.tez.dynamic.partition.pruning=true;
set hive.auto.convert.join=false;
set hive.cbo.enable=false;
explain
select * from
(select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
join
(select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b
on a.key=b.key;
{code}
the explain
{code}
STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-1 depends on stages: Stage-2
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Spark
DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:2
Vertices:
Map 8
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
Target column: ds (string)
partition key expr: ds
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
target work: Map 1
Stage: Stage-1
Spark
Edges:
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1)
Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6 (PARTITION-LEVEL SORT, 1)
Reducer 6 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 7 (PARTITION-LEVEL SORT, 1)
DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: srcpart
Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: ds (type: string)
sort order: +
Map-reduce partition columns: ds (type: string)
Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
value expressions: key (type: string)
Map 4
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: key (type: string)
sort order: +
Map-reduce partition columns: key (type: string)
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Map 7
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: value is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: value (type: string)
sort order: +
Map-reduce partition columns: value (type: string)
Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Reducer 2
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 ds (type: string)
1 key (type: string)
outputColumnNames: _col0, _col2
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col2 (type: string), _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string)
Reducer 3
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: string)
1 _col1 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 6
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 ds (type: string)
1 value (type: string)
outputColumnNames: _col0, _col2
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col2 (type: string), _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string)
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
There is only 1 Map about srcpart. The reason why the maps about srcpart can not be merged when enabling cbo is because the RS in Maps are considered different while they are considered same when disabling cbo(see attached picture)
> HoS: don't combine map works that are targets of different DPPs
> ---------------------------------------------------------------
>
> Key: HIVE-17193
> URL: https://issues.apache.org/jira/browse/HIVE-17193
> Project: Hive
> Issue Type: Bug
> Reporter: Rui Li
> Assignee: Rui Li
>
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger the issue:
> {code}
> explain
> select * from
> (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
> join
> (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b
> on a.key=b.key;
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)