You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "liyunzhang (JIRA)" <ji...@apache.org> on 2017/10/27 09:12:00 UTC
[jira] [Commented] (HIVE-17486) Enable SharedWorkOptimizer in tez
on HOS
[ https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221989#comment-16221989 ]
liyunzhang commented on HIVE-17486:
-----------------------------------
mapjoin.q
{code}
set hive.mapred.mode=nonstrict;
set hive.explain.user=false;
set hive.auto.convert.join=true;
explain
select src1.key, src1.cnt1, src2.cnt1 from
(
select key, count(*) as cnt1 from
(
select a.key as key, a.value as val1, b.value as val2 from tbl1 a join tbl2 b on a.key = b.key
) subq1 group by key
) src1
join
(
select key, count(*) as cnt1 from
(
select a.key as key, a.value as val1, b.value as val2 from tbl1 a join tbl2 b on a.key = b.key
) subq2 group by key
) src2
on src1.key = src2.key
{code}
before shared work optimization, the physical plan in Tez
{code}
TS[0]-FIL[41]-SEL[2]-MAPJOIN[45]-GBY[10]-RS[11]-GBY[12]-MAPJOIN[47]-SEL[31]-FS[32]
TS[3]-FIL[42]-SEL[5]-RS[7]-MAPJOIN[45]
TS[14]-FIL[43]-SEL[16]-MAPJOIN[46]-GBY[24]-RS[25]-GBY[26]-RS[29]-MAPJOIN[47]
TS[17]-FIL[44]-SEL[19]-RS[21]-MAPJOIN[46]
{code}
after the optimization
{code}
TS[0]-FIL[41]-SEL[2]-MAPJOIN[45]-GBY[10]-RS[11]-GBY[12]-MAPJOIN[47]-SEL[31]-FS[32]
-RS[25]-GBY[26]-RS[29]-MAPJOIN[47]
TS[3]-FIL[42]-SEL[5]-RS[7]-MAPJOIN[45]
{code}
so the tez explain
before
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
DagId: root_20171027044107_4977290a-6856-41c5-b0e3-24b4ec32c59e:1
Edges:
Map 1 <- Map 3 (BROADCAST_EDGE)
Map 4 <- Map 6 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 5 (BROADCAST_EDGE)
Reducer 5 <- Map 4 (SIMPLE_EDGE)
DagName: root_20171027044107_4977290a-6856-41c5-b0e3-24b4ec32c59e:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0
input vertices:
1 Map 3
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
HybridGraceHashJoin: true
Group By Operator
aggregations: count()
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Map 3
Map Operator Tree:
TableScan
alias: b
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Map 4
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0
input vertices:
1 Map 6
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
HybridGraceHashJoin: true
Group By Operator
aggregations: count()
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Map 6
Map Operator Tree:
TableScan
alias: b
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Reducer 2
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col1, _col3
input vertices:
1 Reducer 5
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
HybridGraceHashJoin: true
Select Operator
expressions: _col0 (type: int), _col3 (type: bigint), _col1 (type: bigint)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 5
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
After
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
DagId: root_20171027043345_8893c266-bde2-4f23-9914-0dfccc8be5ea:1
Edges:
Map 1 <- Map 4 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 3 (BROADCAST_EDGE)
Reducer 3 <- Map 1 (SIMPLE_EDGE)
DagName: root_20171027043345_8893c266-bde2-4f23-9914-0dfccc8be5ea:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0
input vertices:
1 Map 4
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
HybridGraceHashJoin: true
Group By Operator
aggregations: count()
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Map 4
Map Operator Tree:
TableScan
alias: b
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Reducer 2
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col1, _col3
input vertices:
1 Reducer 3
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
HybridGraceHashJoin: true
Select Operator
expressions: _col0 (type: int), _col3 (type: bigint), _col1 (type: bigint)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 3
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
We can see that there are only 2 Maps(Map1 and Map4) in the after explain (in before explain, there are 4 Maps).
When i tried to change the physical plan for HOS like what HOT does. I found change of the optimization
before
{code}
STAGE DEPENDENCIES:
Stage-3 is a root stage
Stage-2 depends on stages: Stage-3
Stage-4 depends on stages: Stage-2
Stage-1 depends on stages: Stage-4
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-3
Spark
DagName: root_20171027045349_0dc8e597-ffad-4da7-bbfc-c2b7533d9b58:4
Vertices:
Map 6
Map Operator Tree:
TableScan
alias: b
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 _col0 (type: int)
1 _col0 (type: int)
Local Work:
Map Reduce Local Work
Stage: Stage-2
Spark
Edges:
Reducer 5 <- Map 4 (GROUP, 1)
DagName: root_20171027045349_0dc8e597-ffad-4da7-bbfc-c2b7533d9b58:2
Vertices:
Map 4
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0
input vertices:
1 Map 6
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count()
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Local Work:
Map Reduce Local Work
Reducer 5
Local Work:
Map Reduce Local Work
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 _col0 (type: int)
1 _col0 (type: int)
Stage: Stage-4
Spark
DagName: root_20171027045349_0dc8e597-ffad-4da7-bbfc-c2b7533d9b58:3
Vertices:
Map 3
Map Operator Tree:
TableScan
alias: b
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 _col0 (type: int)
1 _col0 (type: int)
Local Work:
Map Reduce Local Work
Stage: Stage-1
Spark
Edges:
Reducer 2 <- Map 1 (GROUP, 1)
DagName: root_20171027045349_0dc8e597-ffad-4da7-bbfc-c2b7533d9b58:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0
input vertices:
1 Map 3
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count()
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Local Work:
Map Reduce Local Work
Reducer 2
Local Work:
Map Reduce Local Work
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col1, _col3
input vertices:
1 Reducer 5
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col3 (type: bigint), _col1 (type: bigint)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
After
{code}
STAGE DEPENDENCIES:
Stage-3 is a root stage
Stage-2 depends on stages: Stage-3
Stage-4 depends on stages: Stage-2
Stage-1 depends on stages: Stage-4
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-3
Spark
DagName: root_20171027045324_b7f21ae7-9bcc-4077-8577-e6c6747dfcff:3
Vertices:
Map 4
Map Operator Tree:
TableScan
alias: b
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 _col0 (type: int)
1 _col0 (type: int)
Local Work:
Map Reduce Local Work
Stage: Stage-2
Spark
Edges:
Reducer 3 <- Map 6 (GROUP, 1)
DagName: root_20171027045324_b7f21ae7-9bcc-4077-8577-e6c6747dfcff:2
Vertices:
Map 6
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0
input vertices:
1 Map 4
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count()
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Local Work:
Map Reduce Local Work
Reducer 3
Local Work:
Map Reduce Local Work
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 _col0 (type: int)
1 _col0 (type: int)
Stage: Stage-4
Spark
DagName: root_20171027045324_b7f21ae7-9bcc-4077-8577-e6c6747dfcff:4
Vertices:
Map 4
Map Operator Tree:
TableScan
alias: b
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 _col0 (type: int)
1 _col0 (type: int)
Local Work:
Map Reduce Local Work
Stage: Stage-1
Spark
Edges:
Reducer 2 <- Map 5 (GROUP, 1)
DagName: root_20171027045324_b7f21ae7-9bcc-4077-8577-e6c6747dfcff:1
Vertices:
Map 5
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int)
outputColumnNames: _col0
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0
input vertices:
1 Map 4
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count()
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 11 Data size: 77 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Local Work:
Map Reduce Local Work
Reducer 2
Local Work:
Map Reduce Local Work
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col1, _col3
input vertices:
1 Reducer 3
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col3 (type: bigint), _col1 (type: bigint)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 5 Data size: 38 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
only the Map about table {{b}} are merged to 1 Map, there are still two Maps about table {{a}}(Map1,Map4).
The reason causes this is because [GenSparkWork|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java#L421] will split following physical operator tree once encounting RS
{code}
TS[0]-FIL[41]-SEL[2]-MAPJOIN[45]-GBY[10]-RS[11]-GBY[12]-MAPJOIN[47]-SEL[31]-FS[32]
-RS[25]-GBY[26]-RS[29]-MAPJOIN[47]
TS[3]-FIL[42]-SEL[5]-RS[7]-MAPJOIN[45]
{code} to
{code}
Map1:TS[0]-FIL[41]-SEL[2]-MAPJOIN[45]-GBY[10]-RS[11]
Reduce2:GBY[12]-MAPJOIN[47]-SEL[31]-FS[32]
Map3:TS[0]-FIL[41]-SEL[2]-MAPJOIN[45]-GBY[10]-RS[25]
Reduce4:GBY[26]-RS[29]
Map5:TS[3]-FIL[42]-SEL[5]-RS[7]
{code}
> Enable SharedWorkOptimizer in tez on HOS
> ----------------------------------------
>
> Key: HIVE-17486
> URL: https://issues.apache.org/jira/browse/HIVE-17486
> Project: Hive
> Issue Type: Bug
> Reporter: liyunzhang
> Assignee: liyunzhang
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be merged so the data is read only once. Optimization will be carried out at the physical level. In Hive on Spark, it caches the result of spark work if the spark work is used by more than 1 child spark work. After sharedWorkOptimizer is enabled in physical plan in HoS, the identical table scans are merged to 1 table scan. This result of table scan will be used by more 1 child spark work. Thus we need not do the same computation because of cache mechanism.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)