You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Chao Sun <ch...@cloudera.com> on 2014/11/21 01:57:14 UTC
Review Request 28307: HIVE-8908 - Investigate test failure on join34.q
[Spark Branch]
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28307/
-----------------------------------------------------------
Review request for hive, Jimmy Xiang and Szehon Ho.
Bugs: HIVE-8908
https://issues.apache.org/jira/browse/HIVE-8908
Repository: hive-git
Description
-------
For this query, the plan doesn't look correct:
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-1 depends on stages: Stage-5, Stage-4
Stage-2 depends on stages: Stage-1
Stage-0 depends on stages: Stage-2
Stage-3 depends on stages: Stage-0
Stage-5 is a root stage
STAGE PLANS:
Stage: Stage-4
Spark
DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:6
Vertices:
Map 4
Map Operator Tree:
TableScan
alias: x
Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
Spark HashTable Sink Operator
condition expressions:
0 {_col1}
1 {value}
keys:
0 _col0 (type: string)
1 key (type: string)
Reduce Output Operator
key expressions: key (type: string)
sort order: +
Map-reduce partition columns: key (type: string)
Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
value expressions: value (type: string)
Local Work:
Map Reduce Local Work
Stage: Stage-1
Spark
Edges:
Union 2 <- Map 1 (NONE, 0), Map 3 (NONE, 0)
DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:4
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: x
Filter Operator
predicate: (key < 20) (type: boolean)
Select Operator
expressions: key (type: string), value (type: string)
outputColumnNames: _col0, _col1
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col1}
1 {key} {value}
keys:
0 _col0 (type: string)
1 key (type: string)
outputColumnNames: _col1, _col2, _col3
input vertices:
1 Map 4
Select Operator
expressions: _col2 (type: string), _col3 (type: string), _col1 (type: string)
outputColumnNames: _col0, _col1, _col2
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.dest_j1
Local Work:
Map Reduce Local Work
Map 3
Map Operator Tree:
TableScan
alias: x1
Filter Operator
predicate: (key > 100) (type: boolean)
Select Operator
expressions: key (type: string), value (type: string)
outputColumnNames: _col0, _col1
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col1}
1 {key} {value}
keys:
0 _col0 (type: string)
1 key (type: string)
outputColumnNames: _col1, _col2, _col3
input vertices:
1 Map 4
Select Operator
expressions: _col2 (type: string), _col3 (type: string), _col1 (type: string)
outputColumnNames: _col0, _col1, _col2
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.dest_j1
Local Work:
Map Reduce Local Work
Union 2
Vertex: Union 2
Stage: Stage-2
Dependency Collection
Stage: Stage-0
Move Operator
tables:
replace: true
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.dest_j1
Stage: Stage-3
Stats-Aggr Operator
Stage: Stage-5
Spark
DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:5
Vertices:
Map 4
Map Operator Tree:
TableScan
alias: x
Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
Spark HashTable Sink Operator
condition expressions:
0 {_col1}
1 {value}
keys:
0 _col0 (type: string)
1 key (type: string)
Reduce Output Operator
key expressions: key (type: string)
sort order: +
Map-reduce partition columns: key (type: string)
Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
value expressions: value (type: string)
Local Work:
Map Reduce Local Work
Time taken: 0.127 seconds, Fetched: 156 row(s)
Note that Stage-4 and Stage-5 are identical. Also, in Stage-4 there's a parallel RS operator with the HTS operator, which is strange.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java 4bfc26f
Diff: https://reviews.apache.org/r/28307/diff/
Testing
-------
Thanks,
Chao Sun