You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Navis Ryu <na...@nexr.com> on 2014/07/04 03:59:32 UTC
Review Request 23271: Wrong results caused by hive ppd under specific join
condition
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23271/
-----------------------------------------------------------
Review request for hive.
Bugs: HIVE-7346
https://issues.apache.org/jira/browse/HIVE-7346
Repository: hive-git
Description
-------
Assuming two tables :
{code:sql} t1(id1 string,id2 string) , t2 (id string,d int) {code}
t1 contains 1 row : 'a','a'
t2 contains 1 row : 'a',2
The following query :
{code:sql} select a.*,b.d d1,c.d d2
from t1 a join t2 b on (a.id1=b.id)
join t2 c on (a.id2=b.id)
where b.d <=1 and c.d<=1 {code}
Returns 0 rows as expected because t2.d = 2
Wrapping this query, like so :
{code:sql} select * from (
select a.*,b.d d1,c.d d2
from t1 a join t2 b on (a.id1=b.id)
join t2 c on (a.id2=b.id)
where b.d <=1 and c.d<=1
) z where d1>1 or d2>1 {code}
Where another filter was add on the columns causes the plan to lack the filter of the "<=1" and return a single row - *Wrong Results*.
The plan is :
{code:sql}
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (<= (. (TOK_TABLE_OR_COL b) d) 1) (<= (. (TOK_TABLE_OR_COL c) d) 1))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (> (TOK_TABLE_OR_COL d1) 1) (> (TOK_TABLE_OR_COL d2) 1)))))
STAGE DEPENDENCIES:
Stage-7 is a root stage
Stage-5 depends on stages: Stage-7
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-7
Map Reduce Local Work
Alias -> Map Local Tables:
z:b
Fetch Operator
limit: -1
z:c
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
z:b
TableScan
alias: b
HashTable Sink Operator
condition expressions:
0 {id1} {id2}
1 {id} {d}
handleSkewJoin: false
keys:
0 [Column[id1]]
1 [Column[id]]
Position of Big Table: 0
z:c
TableScan
alias: c
HashTable Sink Operator
condition expressions:
0 {_col5} {_col0} {_col1}
1 {d}
handleSkewJoin: false
keys:
0 []
1 []
Position of Big Table: 0
Stage: Stage-5
Map Reduce
Alias -> Map Operator Tree:
z:a
TableScan
alias: a
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {id1} {id2}
1 {id} {d}
handleSkewJoin: false
keys:
0 [Column[id1]]
1 [Column[id]]
outputColumnNames: _col0, _col1, _col4, _col5
Position of Big Table: 0
Filter Operator
predicate:
expr: (_col1 = _col4)
type: boolean
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col5} {_col0} {_col1}
1 {d}
handleSkewJoin: false
keys:
0 []
1 []
outputColumnNames: _col1, _col4, _col5, _col9
Position of Big Table: 0
Filter Operator
predicate:
expr: ((_col1 > 1) or (_col9 > 1))
type: boolean
Select Operator
expressions:
expr: _col4
type: string
expr: _col5
type: string
expr: _col1
type: int
expr: _col9
type: int
outputColumnNames: _col0, _col1, _col2, _col3
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
{code}
Setting : {code:sql} hive.optimize.ppd=false {code}
Results in the following *correct* plan : {code:sql} ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (<= (. (TOK_TABLE_OR_COL b) d) 1) (<= (. (TOK_TABLE_OR_COL c) d) 1))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (> (TOK_TABLE_OR_COL d1) 1) (> (TOK_TABLE_OR_COL d2) 1)))))
STAGE DEPENDENCIES:
Stage-7 is a root stage
Stage-5 depends on stages: Stage-7
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-7
Map Reduce Local Work
Alias -> Map Local Tables:
z:b
Fetch Operator
limit: -1
z:c
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
z:b
TableScan
alias: b
HashTable Sink Operator
condition expressions:
0 {id1} {id2}
1 {id} {d}
handleSkewJoin: false
keys:
0 [Column[id1]]
1 [Column[id]]
Position of Big Table: 0
z:c
TableScan
alias: c
HashTable Sink Operator
condition expressions:
0 {_col5} {_col0} {_col1}
1 {d}
handleSkewJoin: false
keys:
0 []
1 []
Position of Big Table: 0
Stage: Stage-5
Map Reduce
Alias -> Map Operator Tree:
z:a
TableScan
alias: a
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {id1} {id2}
1 {id} {d}
handleSkewJoin: false
keys:
0 [Column[id1]]
1 [Column[id]]
outputColumnNames: _col0, _col1, _col4, _col5
Position of Big Table: 0
Filter Operator
predicate:
expr: (_col1 = _col4)
type: boolean
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col5} {_col0} {_col1}
1 {d}
handleSkewJoin: false
keys:
0 []
1 []
outputColumnNames: _col1, _col4, _col5, _col9
Position of Big Table: 0
Filter Operator
predicate:
expr: ((_col1 <= 1) and (_col9 <= 1))
type: boolean
Select Operator
expressions:
expr: _col4
type: string
expr: _col5
type: string
expr: _col1
type: int
expr: _col9
type: int
outputColumnNames: _col0, _col1, _col2, _col3
Filter Operator
predicate:
expr: ((_col2 > 1) or (_col3 > 1))
type: boolean
Select Operator
expressions:
expr: _col0
type: string
expr: _col1
type: string
expr: _col2
type: int
expr: _col3
type: int
outputColumnNames: _col0, _col1, _col2, _col3
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1 {code}
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java e7db370
ql/src/test/queries/clientpositive/ppd_join5.q PRE-CREATION
ql/src/test/results/clientpositive/ppd_join5.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/23271/diff/
Testing
-------
Thanks,
Navis Ryu