You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Banias H <ba...@gmail.com> on 2016/05/31 17:08:45 UTC

How to disable SMB join?

Hi,

Does anybody know if there a config setting to disable SMB join?

One of our Hive queries failed with ArrayIndexOutOfBoundsException when Tez
is the execution engine. The error seems to be addressed by
https://issues.apache.org/jira/browse/HIVE-13282

We have Hive 1.2 and Tez 0.7 in our cluster and the workaround suggested in
the ticket is to disable SMB join. I searched around and only found the
setting to convert to SMB MapJoin. Any help on disabling SMB join
altogether would be appreciated. Thanks.

-B

RE: How to disable SMB join?

Posted by "Markovitz, Dudu" <dm...@paypal.com>.
Hi

The documentation describes a scenario where SMB join leads to the same error you’ve got.
It claims that changing the order of the tables solves the problem.

Dudu


https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-SMBJoinacrossTableswithDifferentKeys
SMB Join across Tables with Different Keys
If the tables have differing number of keys, for example Table A has 2 SORT columns and Table B has 1 SORT column, then you might get an index out of bounds exception.
The following query results in an index out of bounds exception because emp_person let us say for example has 1 sort column while emp_pay_history has 2 sort columns.
Error Hive 0.11
SELECT p.*, py.*
FROM emp_person p INNER JOIN emp_pay_history py
ON   p.empid = py.empid

This works fine.
Working query Hive 0.11
SELECT p.*, py.*
FROM emp_pay_history py INNER JOIN emp_person p
ON   p.empid = py.empid




From: Banias H [mailto:banias4spark@gmail.com]
Sent: Tuesday, May 31, 2016 8:09 PM
To: user@hive.apache.org
Subject: How to disable SMB join?

Hi,

Does anybody know if there a config setting to disable SMB join?

One of our Hive queries failed with ArrayIndexOutOfBoundsException when Tez is the execution engine. The error seems to be addressed by https://issues.apache.org/jira/browse/HIVE-13282

We have Hive 1.2 and Tez 0.7 in our cluster and the workaround suggested in the ticket is to disable SMB join. I searched around and only found the setting to convert to SMB MapJoin. Any help on disabling SMB join altogether would be appreciated. Thanks.

-B