You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Navis (JIRA)" <ji...@apache.org> on 2014/07/04 03:07:34 UTC
[jira] [Assigned] (HIVE-7346) Wrong results caused by hive ppd
under specific join condition
[ https://issues.apache.org/jira/browse/HIVE-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis reassigned HIVE-7346:
---------------------------
Assignee: Navis
> Wrong results caused by hive ppd under specific join condition
> --------------------------------------------------------------
>
> Key: HIVE-7346
> URL: https://issues.apache.org/jira/browse/HIVE-7346
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.12.0
> Reporter: dima machlin
> Assignee: Navis
>
> Assuming two tables :
> {code:sql} t1(id1 string,id2 string) , t2 (id string,d int) {code}
> t1 contains 1 row : 'a','a'
> t2 contains 1 row : 'a',2
> The following query :
> {code:sql} select a.*,b.d d1,c.d d2
> from t1 a join t2 b on (a.id1=b.id)
> join t2 c on (a.id2=b.id)
> where b.d <=1 and c.d<=1 {code}
> Returns 0 rows as expected because t2.d = 2
> Wrapping this query, like so :
> {code:sql} select * from (
> select a.*,b.d d1,c.d d2
> from t1 a join t2 b on (a.id1=b.id)
> join t2 c on (a.id2=b.id)
> where b.d <=1 and c.d<=1
> ) z where d1>1 or d2>1 {code}
> Where another filter was add on the columns causes the plan to lack the filter of the "<=1" and return a single row - *Wrong Results*.
> The plan is :
> {code:sql}
> ABSTRACT SYNTAX TREE:
> (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (<= (. (TOK_TABLE_OR_COL b) d) 1) (<= (. (TOK_TABLE_OR_COL c) d) 1))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (> (TOK_TABLE_OR_COL d1) 1) (> (TOK_TABLE_OR_COL d2) 1)))))
> STAGE DEPENDENCIES:
> Stage-7 is a root stage
> Stage-5 depends on stages: Stage-7
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-7
> Map Reduce Local Work
> Alias -> Map Local Tables:
> z:b
> Fetch Operator
> limit: -1
> z:c
> Fetch Operator
> limit: -1
> Alias -> Map Local Operator Tree:
> z:b
> TableScan
> alias: b
> HashTable Sink Operator
> condition expressions:
> 0 {id1} {id2}
> 1 {id} {d}
> handleSkewJoin: false
> keys:
> 0 [Column[id1]]
> 1 [Column[id]]
> Position of Big Table: 0
> z:c
> TableScan
> alias: c
> HashTable Sink Operator
> condition expressions:
> 0 {_col5} {_col0} {_col1}
> 1 {d}
> handleSkewJoin: false
> keys:
> 0 []
> 1 []
> Position of Big Table: 0
> Stage: Stage-5
> Map Reduce
> Alias -> Map Operator Tree:
> z:a
> TableScan
> alias: a
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {id1} {id2}
> 1 {id} {d}
> handleSkewJoin: false
> keys:
> 0 [Column[id1]]
> 1 [Column[id]]
> outputColumnNames: _col0, _col1, _col4, _col5
> Position of Big Table: 0
> Filter Operator
> predicate:
> expr: (_col1 = _col4)
> type: boolean
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col5} {_col0} {_col1}
> 1 {d}
> handleSkewJoin: false
> keys:
> 0 []
> 1 []
> outputColumnNames: _col1, _col4, _col5, _col9
> Position of Big Table: 0
> Filter Operator
> predicate:
> expr: ((_col1 > 1) or (_col9 > 1))
> type: boolean
> Select Operator
> expressions:
> expr: _col4
> type: string
> expr: _col5
> type: string
> expr: _col1
> type: int
> expr: _col9
> type: int
> outputColumnNames: _col0, _col1, _col2, _col3
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Local Work:
> Map Reduce Local Work
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {code}
> Setting : {code:sql} hive.optimize.ppd=false {code}
> Results in the following *correct* plan : {code:sql} ABSTRACT SYNTAX TREE:
> (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (<= (. (TOK_TABLE_OR_COL b) d) 1) (<= (. (TOK_TABLE_OR_COL c) d) 1))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (> (TOK_TABLE_OR_COL d1) 1) (> (TOK_TABLE_OR_COL d2) 1)))))
> STAGE DEPENDENCIES:
> Stage-7 is a root stage
> Stage-5 depends on stages: Stage-7
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-7
> Map Reduce Local Work
> Alias -> Map Local Tables:
> z:b
> Fetch Operator
> limit: -1
> z:c
> Fetch Operator
> limit: -1
> Alias -> Map Local Operator Tree:
> z:b
> TableScan
> alias: b
> HashTable Sink Operator
> condition expressions:
> 0 {id1} {id2}
> 1 {id} {d}
> handleSkewJoin: false
> keys:
> 0 [Column[id1]]
> 1 [Column[id]]
> Position of Big Table: 0
> z:c
> TableScan
> alias: c
> HashTable Sink Operator
> condition expressions:
> 0 {_col5} {_col0} {_col1}
> 1 {d}
> handleSkewJoin: false
> keys:
> 0 []
> 1 []
> Position of Big Table: 0
> Stage: Stage-5
> Map Reduce
> Alias -> Map Operator Tree:
> z:a
> TableScan
> alias: a
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {id1} {id2}
> 1 {id} {d}
> handleSkewJoin: false
> keys:
> 0 [Column[id1]]
> 1 [Column[id]]
> outputColumnNames: _col0, _col1, _col4, _col5
> Position of Big Table: 0
> Filter Operator
> predicate:
> expr: (_col1 = _col4)
> type: boolean
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col5} {_col0} {_col1}
> 1 {d}
> handleSkewJoin: false
> keys:
> 0 []
> 1 []
> outputColumnNames: _col1, _col4, _col5, _col9
> Position of Big Table: 0
> Filter Operator
> predicate:
> expr: ((_col1 <= 1) and (_col9 <= 1))
> type: boolean
> Select Operator
> expressions:
> expr: _col4
> type: string
> expr: _col5
> type: string
> expr: _col1
> type: int
> expr: _col9
> type: int
> outputColumnNames: _col0, _col1, _col2, _col3
> Filter Operator
> predicate:
> expr: ((_col2 > 1) or (_col3 > 1))
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: string
> expr: _col1
> type: string
> expr: _col2
> type: int
> expr: _col3
> type: int
> outputColumnNames: _col0, _col1, _col2, _col3
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Local Work:
> Map Reduce Local Work
> Stage: Stage-0
> Fetch Operator
> limit: -1 {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)