You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jeff Zhang <zj...@gmail.com> on 2016/03/01 05:29:07 UTC
Fwd: Wrong column is picked in HIVE 2.0.0 + TEZ 0.8.2 left join
+ hive mail list, more likely hive issue.
---------- Forwarded message ----------
From: GAO Chi <ch...@microfun.com>
Date: Tue, Mar 1, 2016 at 12:24 PM
Subject: Wrong column is picked in HIVE 2.0.0 + TEZ 0.8.2 left join
To: user@tez.apache.org
Hi all,
We encountered a strange behavior after upgrading to HIVE 2.0.0 + TEZ
0.8.2.
I simplified our query to this:
SELECT
a.key,
a.a_one,
b.b_one,
a.a_zero,
b.b_zero
FROM
(
SELECT
11 key,
0 confuse_you,
1 a_one,
0 a_zero
) a
LEFT JOIN
(
SELECT
11 key,
0 confuse_you,
1 b_one,
0 b_zero
) b
ON a.key = b.key
;
Above query generates this unexpected result:
INFO : Status: Running (Executing on YARN cluster with App id
application_1456723490535_3653)
INFO : Map 1: 0/1 Map 2: 0/1
INFO : Map 1: 0/1 Map 2: 0(+1)/1
INFO : Map 1: 0(+1)/1 Map 2: 0(+1)/1
INFO : Map 1: 0(+1)/1 Map 2: 1/1
INFO : Map 1: 1/1 Map 2: 1/1
INFO : Completed executing
command(queryId=hive_20160301115630_0a0dbee5-ba4b-45e7-b027-085f655640fd);
Time taken: 10.225 seconds
INFO : OK
+--------+----------+----------+-----------+-----------+--+
| a.key | a.a_one | b.b_one | a.a_zero | b.b_zero |
+--------+----------+----------+-----------+-----------+--+
| 11 | 1 | 0 | 0 | 1 |
+--------+----------+----------+-----------+-----------+--+
If you change the constant value of subquery-b’s confuse_you column from 0
to 2, the problem disappears. The plan returned from EXPLAIN shows the
incorrect one is picking _col1 and _col2, while the correct one is picking
_col2 and _col3 form sub query b.
Seems it cannot distinguish 2 columns with same constant value?
Anyone encountered similar problem?
Thanks!
Chi
--
Best Regards
Jeff Zhang