You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "dima machlin (JIRA)" <ji...@apache.org> on 2014/07/03 10:08:24 UTC
[jira] [Created] (HIVE-7346) Wrong results caused by hive ppd under specific join condition

dima machlin created HIVE-7346:
----------------------------------

             Summary: Wrong results caused by hive ppd under specific join condition
                 Key: HIVE-7346
                 URL: https://issues.apache.org/jira/browse/HIVE-7346
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.12.0
            Reporter: dima machlin


Assuming two tables :
{code:sql} t1(id1 string,id2 string) , t2 (id string,d int) {code}
t1 contains 1 row : 'a','a'
t2 contains 1 row : 'a',2

The following query : 
{code:sql} select a.*,b.d d1,c.d d2
from t1 a join t2 b on (a.id1=b.id)
join t2 c on (a.id2=b.id)
where b.d <=1 and c.d<=1 {code}

Returns 0 rows as expected because t2.d = 2

Wrapping this query, like so : 
{code:sql} select * from (

select a.*,b.d d1,c.d d2
from t1 a join t2 b on (a.id1=b.id)
join t2 c on (a.id2=b.id)
where b.d <=1 and c.d<=1

) z where d1>1 or d2>1 {code}
Where another filter was add on the columns causes the plan to lack the filter of the "<=1" and return a single row - *Wrong Results*.

The plan is : 
{code:sql}
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (<= (. (TOK_TABLE_OR_COL b) d) 1) (<= (. (TOK_TABLE_OR_COL c) d) 1))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (> (TOK_TABLE_OR_COL d1) 1) (> (TOK_TABLE_OR_COL d2) 1)))))

STAGE DEPENDENCIES:
  Stage-7 is a root stage
  Stage-5 depends on stages: Stage-7
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-7
    Map Reduce Local Work
      Alias -> Map Local Tables:
        z:b 
          Fetch Operator
            limit: -1
        z:c 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        z:b 
          TableScan
            alias: b
            HashTable Sink Operator
              condition expressions:
                0 {id1} {id2}
                1 {id} {d}
              handleSkewJoin: false
              keys:
                0 [Column[id1]]
                1 [Column[id]]
              Position of Big Table: 0
        z:c 
          TableScan
            alias: c
            HashTable Sink Operator
              condition expressions:
                0 {_col5} {_col0} {_col1}
                1 {d}
              handleSkewJoin: false
              keys:
                0 []
                1 []
              Position of Big Table: 0

  Stage: Stage-5
    Map Reduce
      Alias -> Map Operator Tree:
        z:a 
          TableScan
            alias: a
            Map Join Operator
              condition map:
                   Inner Join 0 to 1
              condition expressions:
                0 {id1} {id2}
                1 {id} {d}
              handleSkewJoin: false
              keys:
                0 [Column[id1]]
                1 [Column[id]]
              outputColumnNames: _col0, _col1, _col4, _col5
              Position of Big Table: 0
              Filter Operator
                predicate:
                    expr: (_col1 = _col4)
                    type: boolean
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  condition expressions:
                    0 {_col5} {_col0} {_col1}
                    1 {d}
                  handleSkewJoin: false
                  keys:
                    0 []
                    1 []
                  outputColumnNames: _col1, _col4, _col5, _col9
                  Position of Big Table: 0
                  Filter Operator
                    predicate:
                        expr: ((_col1 > 1) or (_col9 > 1))
                        type: boolean
                    Select Operator
                      expressions:
                            expr: _col4
                            type: string
                            expr: _col5
                            type: string
                            expr: _col1
                            type: int
                            expr: _col9
                            type: int
                      outputColumnNames: _col0, _col1, _col2, _col3
                      File Output Operator
                        compressed: false
                        GlobalTableId: 0
                        table:
                            input format: org.apache.hadoop.mapred.TextInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
{code}

Setting : {code:sql} hive.optimize.ppd=false {code}
Results in the following *correct* plan : {code:sql} ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (<= (. (TOK_TABLE_OR_COL b) d) 1) (<= (. (TOK_TABLE_OR_COL c) d) 1))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (> (TOK_TABLE_OR_COL d1) 1) (> (TOK_TABLE_OR_COL d2) 1)))))

STAGE DEPENDENCIES:
  Stage-7 is a root stage
  Stage-5 depends on stages: Stage-7
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-7
    Map Reduce Local Work
      Alias -> Map Local Tables:
        z:b 
          Fetch Operator
            limit: -1
        z:c 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        z:b 
          TableScan
            alias: b
            HashTable Sink Operator
              condition expressions:
                0 {id1} {id2}
                1 {id} {d}
              handleSkewJoin: false
              keys:
                0 [Column[id1]]
                1 [Column[id]]
              Position of Big Table: 0
        z:c 
          TableScan
            alias: c
            HashTable Sink Operator
              condition expressions:
                0 {_col5} {_col0} {_col1}
                1 {d}
              handleSkewJoin: false
              keys:
                0 []
                1 []
              Position of Big Table: 0

  Stage: Stage-5
    Map Reduce
      Alias -> Map Operator Tree:
        z:a 
          TableScan
            alias: a
            Map Join Operator
              condition map:
                   Inner Join 0 to 1
              condition expressions:
                0 {id1} {id2}
                1 {id} {d}
              handleSkewJoin: false
              keys:
                0 [Column[id1]]
                1 [Column[id]]
              outputColumnNames: _col0, _col1, _col4, _col5
              Position of Big Table: 0
              Filter Operator
                predicate:
                    expr: (_col1 = _col4)
                    type: boolean
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  condition expressions:
                    0 {_col5} {_col0} {_col1}
                    1 {d}
                  handleSkewJoin: false
                  keys:
                    0 []
                    1 []
                  outputColumnNames: _col1, _col4, _col5, _col9
                  Position of Big Table: 0
                  Filter Operator
                    predicate:
                        expr: ((_col1 <= 1) and (_col9 <= 1))
                        type: boolean
                    Select Operator
                      expressions:
                            expr: _col4
                            type: string
                            expr: _col5
                            type: string
                            expr: _col1
                            type: int
                            expr: _col9
                            type: int
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Filter Operator
                        predicate:
                            expr: ((_col2 > 1) or (_col3 > 1))
                            type: boolean
                        Select Operator
                          expressions:
                                expr: _col0
                                type: string
                                expr: _col1
                                type: string
                                expr: _col2
                                type: int
                                expr: _col3
                                type: int
                          outputColumnNames: _col0, _col1, _col2, _col3
                          File Output Operator
                            compressed: false
                            GlobalTableId: 0
                            table:
                                input format: org.apache.hadoop.mapred.TextInputFormat
                                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1 {code}






--
This message was sent by Atlassian JIRA
(v6.2#6252)