You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2016/12/01 11:51:59 UTC
[jira] [Commented] (HIVE-15327) Outerjoin might produce wrong
result depending on joinEmitInterval value
[ https://issues.apache.org/jira/browse/HIVE-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711771#comment-15711771 ]
Jesus Camacho Rodriguez commented on HIVE-15327:
------------------------------------------------
[~ashutoshc], could you review the patch?
Issue is not reproducible with Tez join operators implementation, but I thought it was worth to add the tests so we do not regress in the future.
> Outerjoin might produce wrong result depending on joinEmitInterval value
> ------------------------------------------------------------------------
>
> Key: HIVE-15327
> URL: https://issues.apache.org/jira/browse/HIVE-15327
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 1.3.0, 2.2.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
> Priority: Critical
> Attachments: HIVE-15327.patch
>
>
> If joinEmitInterval is smaller than the group size, outerjoins might produce records with NULL appended values multiple times (once per group).
> HIVE-4689 targeted the same problem. However, the fix does not seem to cover all cases (in particular, it will not apply to left outer joins with filter conditions on the left input). The solution in HIVE-4689 was to disable (override) joinEmitInterval value for those cases. This fix follows the same approach.
> To reproduce the problem:
> {code}
> set hive.strict.checks.cartesian.product=false;
> set hive.join.emit.interval=1;
> CREATE TABLE test1 (key INT, value INT, col_1 STRING);
> INSERT INTO test1 VALUES (99, 0, 'Alice');
> INSERT INTO test1 VALUES (99, 2, 'Mat');
> INSERT INTO test1 VALUES (100, 1, 'Bob');
> INSERT INTO test1 VALUES (101, 2, 'Car');
> CREATE TABLE test2 (key INT, value INT, col_2 STRING);
> INSERT INTO test2 VALUES (102, 2, 'Del');
> INSERT INTO test2 VALUES (103, 2, 'Ema');
> INSERT INTO test2 VALUES (104, 3, 'Fli');
> -- Equi-condition and condition on one input (left outer join)
> SELECT *
> FROM test1 LEFT OUTER JOIN test2
> ON (test1.value=test2.value AND test1.key between 100 and 102)
> LIMIT 10;
> -- Condition on one input (left outer join)
> SELECT *
> FROM test1 LEFT OUTER JOIN test2
> ON (test1.key between 100 and 102)
> LIMIT 10;
> {code}
> For the *first* query, current (incorrect) result is:
> {noformat}
> 99 0 Alice NULL NULL NULL
> 100 1 Bob NULL NULL NULL
> 101 2 Car 103 2 Ema
> 99 2 Mat NULL NULL NULL
> 101 2 Car 102 2 Del
> 99 2 Mat NULL NULL NULL
> {noformat}
> Expected (correct) result is:
> {noformat}
> 99 0 Alice NULL NULL NULL
> 100 1 Bob NULL NULL NULL
> 101 2 Car 103 2 Ema
> 101 2 Car 102 2 Del
> 99 2 Mat NULL NULL NULL
> {noformat}
> For the *second* query, current (incorrect) result is:
> {noformat}
> 101 2 Car 104 3 Fli
> 100 1 Bob 104 3 Fli
> 99 2 Mat NULL NULL NULL
> 99 0 Alice NULL NULL NULL
> 101 2 Car 103 2 Ema
> 100 1 Bob 103 2 Ema
> 99 2 Mat NULL NULL NULL
> 99 0 Alice NULL NULL NULL
> 101 2 Car 102 2 Del
> 100 1 Bob 102 2 Del
> {noformat}
> Expected (correct) result is:
> {noformat}
> 101 2 Car 104 3 Fli
> 101 2 Car 103 2 Ema
> 101 2 Car 102 2 Del
> 100 1 Bob 104 3 Fli
> 100 1 Bob 103 2 Ema
> 100 1 Bob 102 2 Del
> 99 2 Mat NULL NULL NULL
> 99 0 Alice NULL NULL NULL
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)