You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "GuangMing Lu (Jira)" <ji...@apache.org> on 2022/03/09 13:16:00 UTC

[jira] [Updated] (HIVE-26018) The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR

     [ https://issues.apache.org/jira/browse/HIVE-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

GuangMing Lu updated HIVE-26018:
--------------------------------
    Description: 
The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and the result Is not correct, for example:

CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;

insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');

SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE  T2_n1x b (b.key);

Hive on Tez result: wrong

{+}-------{-}{-}{+}-------+
|a.key  |b.key  |
|aaa    |aaa    |
|bbb    |NULL  |
|ccc    |ccc    |
|NULL  |ddd    |

{+}-------{-}{-}{+}-------+
Hive on MR result: right

{+}-------{-}{-}{+}-------+
|a.key  |b.key  |

 
|aaa    |aaa    |
|bbb    |NULL  |
|ccc    |ccc    |

{+}-------{-}{-}{+}-------+

SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);

Hive on Tez result: wrong

{+}-------{-}{-}{+}-------+
|a.key  |b.key  |

{+}-------{-}{-}{+}-------+
|aaa    |aaa    |
|bbb    |NULL  |
|ccc    |ccc    |
|NULL  |ddd    |

{+}-------{-}{-}{+}-------+

Hive on MR result: right

{+}-------{-}{-}{+}-------+
|a.key  |b.key  |

{+}-------{-}{-}{+}-------+
|aaa    |aaa    |
|ccc    |ccc    |

{+}-------{-}{-}{+}-------+

 

  was:
The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and the result Is not correct, for example:

CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;

insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');

SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE  T2_n1x b (b.key);

Hive on Tez result: wrong

+--------+--------+
| a.key  | b.key  |
+--------+--------+
| aaa    | aaa    |
| bbb    | NULL   |
| ccc    | ccc    |
| NULL   | ddd    |
+--------+--------+
Hive on MR result: right

+--------+--------+
| a.key  | b.key  |
+--------+--------+
| aaa    | aaa    |
| bbb    | NULL   |
| ccc    | ccc    |
+--------+--------+

SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);

Hive on Tez result: wrong

+--------+--------+
| a.key  | b.key  |
+--------+--------+
| aaa    | aaa    |
| bbb    | NULL   |
| ccc    | ccc    |
| NULL   | ddd    |
+--------+--------+

Hive on MR result: right

+--------+--------+
| a.key  | b.key  |
+--------+--------+
| aaa    | aaa    |
| ccc    | ccc    |
+--------+--------+

 


> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR
> -----------------------------------------------------------------------
>
>                 Key: HIVE-26018
>                 URL: https://issues.apache.org/jira/browse/HIVE-26018
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez
>    Affects Versions: 3.1.0, 4.0.0
>            Reporter: GuangMing Lu
>            Priority: Major
>         Attachments: image-2022-03-09-21-08-17-835.png
>
>
> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and the result Is not correct, for example:
> CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
> CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;
> insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
> insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');
> SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE  T2_n1x b (b.key);
> Hive on Tez result: wrong
> {+}-------{-}{-}{+}-------+
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> {+}-------{-}{-}{+}-------+
> Hive on MR result: right
> {+}-------{-}{-}{+}-------+
> |a.key  |b.key  |
>  
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> {+}-------{-}{-}{+}-------+
> SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);
> Hive on Tez result: wrong
> {+}-------{-}{-}{+}-------+
> |a.key  |b.key  |
> {+}-------{-}{-}{+}-------+
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> {+}-------{-}{-}{+}-------+
> Hive on MR result: right
> {+}-------{-}{-}{+}-------+
> |a.key  |b.key  |
> {+}-------{-}{-}{+}-------+
> |aaa    |aaa    |
> |ccc    |ccc    |
> {+}-------{-}{-}{+}-------+
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)