You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Pavan Srinivas (JIRA)" <ji...@apache.org> on 2015/02/23 20:25:11 UTC

[jira] [Created] (HIVE-9753) Wrong results when using multiple levels of Joins. When table alias of one of the table is null with left outer joins.

Pavan Srinivas created HIVE-9753:
------------------------------------

             Summary: Wrong results when using multiple levels of Joins. When table alias of one of the table is null with left outer joins.  
                 Key: HIVE-9753
                 URL: https://issues.apache.org/jira/browse/HIVE-9753
             Project: Hive
          Issue Type: Bug
            Reporter: Pavan Srinivas
            Priority: Critical


Let take scenario, where the tables are:
{code}
drop table table1;
CREATE TABLE table1(
  col1 string,
  col2 string,
  col3 string,
  col4 string
  )
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

drop table table2;
CREATE  TABLE table2(
  col1 string,
  col2 bigint,
  col3 string,
  col4 string
  )
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';


drop table table3;
CREATE  TABLE table3(
  col1 string,
  col2 int,
  col3 int,
  col4 string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}

Query with wrong results:
{code}

SELECT t1.col1 AS dummy,
    t1.expected_column AS expected_column,
    t2.col4
FROM (
    SELECT col1,
        '23-11111',
        '23-13' as three,
        col4 AS expected_column

    FROM table1
    ) t1
JOIN table2 t2
    ON cast(t2.col1 as string) = cast(t1.col1 as string)
LEFT OUTER JOIN
    (SELECT col4, col1
    FROM table3
    ) t3
    ON t2.col4 = t3.col1  
;

{code}

and explain output: 
{code}
STAGE DEPENDENCIES:
  Stage-7 is a root stage
  Stage-5 depends on stages: Stage-7
  Stage-0 depends on stages: Stage-5

STAGE PLANS:
  Stage: Stage-7
    Map Reduce Local Work
      Alias -> Map Local Tables:
        t1:table1
          Fetch Operator
            limit: -1
        t3:table3
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        t1:table1
          TableScan
            alias: table1
            Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
            Filter Operator
              predicate: col1 is not null (type: boolean)
              Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
              Select Operator
                expressions: col1 (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                HashTable Sink Operator
                  condition expressions:
                    0
                    1 {col4}
                  keys:
                    0 _col0 (type: string)
                    1 col1 (type: string)
        t3:table3
          TableScan
            alias: table3
            Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
            Select Operator
              expressions: col1 (type: string)
              outputColumnNames: _col1
              Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
              HashTable Sink Operator
                condition expressions:
                  0 {_col0} {_col7} {_col7}
                  1
                keys:
                  0 _col7 (type: string)
                  1 _col1 (type: string)

  Stage: Stage-5
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: t2
            Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
            Filter Operator
              predicate: col1 is not null (type: boolean)
              Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {_col0}
                  1 {col4}
                keys:
                  0 _col0 (type: string)
                  1 col1 (type: string)
                outputColumnNames: _col0, _col7
                Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                Map Join Operator
                  condition map:
                       Left Outer Join0 to 1
                  condition expressions:
                    0 {_col0} {_col7} {_col7}
                    1
                  keys:
                    0 _col7 (type: string)
                    1 _col1 (type: string)
                  outputColumnNames: _col0, _col3, _col7
                  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                  Select Operator
                    expressions: _col0 (type: string), _col3 (type: string), _col7 (type: string)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}

After the patch, explain output(correct)

{code}
STAGE DEPENDENCIES:
  Stage-7 is a root stage
  Stage-5 depends on stages: Stage-7
  Stage-0 depends on stages: Stage-5

STAGE PLANS:
  Stage: Stage-7
    Map Reduce Local Work
      Alias -> Map Local Tables:
        t1:table1
          Fetch Operator
            limit: -1
        t3:table3
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        t1:table1
          TableScan
            alias: table1
            Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
            Filter Operator
              predicate: col1 is not null (type: boolean)
              Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
              Select Operator
                expressions: col1 (type: string), col4 (type: string)
                outputColumnNames: _col0, _col3
                Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                HashTable Sink Operator
                  condition expressions:
                    0 {_col3}
                    1 {col4}
                  keys:
                    0 _col0 (type: string)
                    1 col1 (type: string)
        t3:table3
          TableScan
            alias: table3
            Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
            Select Operator
              expressions: col1 (type: string)
              outputColumnNames: _col1
              Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
              HashTable Sink Operator
                condition expressions:
                  0 {_col0} {_col3} {_col7}
                  1
                keys:
                  0 _col7 (type: string)
                  1 _col1 (type: string)

  Stage: Stage-5
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: t2
            Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
            Filter Operator
              predicate: col1 is not null (type: boolean)
              Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {_col0} {_col3}
                  1 {col4}
                keys:
                  0 _col0 (type: string)
                  1 col1 (type: string)
                outputColumnNames: _col0, _col3, _col7
                Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                Map Join Operator
                  condition map:
                       Left Outer Join0 to 1
                  condition expressions:
                    0 {_col0} {_col3} {_col7}
                    1
                  keys:
                    0 _col7 (type: string)
                    1 _col1 (type: string)
                  outputColumnNames: _col0, _col3, _col7
                  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                  Select Operator
                    expressions: _col0 (type: string), _col3 (type: string), _col7 (type: string)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}
Notice the difference of "_col3" between the two explain outputs. 

Output of the query:(WRONG)
{code}
one	bad_data_1	bad_data_1
one	bad_data_1	bad_data_1
one	bad_data_1	bad_data_1
one	bad_data_1	bad_data_1
two	bad_data_2	bad_data_2
two	bad_data_2	bad_data_2
two	bad_data_2	bad_data_2
two	bad_data_2	bad_data_2
one	bad_data_1	bad_data_1
one	bad_data_1	bad_data_1
one	bad_data_1	bad_data_1
one	bad_data_1	bad_data_1
two	bad_data_2	bad_data_2
two	bad_data_2	bad_data_2
two	bad_data_2	bad_data_2
two	bad_data_2	bad_data_2
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)