You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Aleksei Udatšnõi <a....@gmail.com> on 2014/05/02 14:36:04 UTC

largest table last in joins

Hello,

There is this old recommendation for optimizing Hive join to use the
largest table last in the join.
http://archive.cloudera.com/cdh/3/hive/language_manual/joins.html

The same recommendation appears in Programming Hive book.

Is this recommendation still valid or newer version of Hive take care of
such optimization automatically?

Best,
Aleksei

Re: largest table last in joins

Posted by Db-Blog <mp...@gmail.com>.
Hi, 
If we have one big table joining with a small table and MAPJOIN hint is specified on the Smaller table, still the ordering will be required? 

We can always forcefully set the auto convert join property to false and enable mapjoin hints. 

Please let me know if I am off base on this topic. 

Thanks,
Saurabh

Sent from my iPhone, please avoid typos.

> On 05-May-2014, at 9:19 pm, Alan Gates <ga...@hortonworks.com> wrote:
> 
> Join ordering is not yet part of the Hive optimizer.  There is integration work being done with the Optiq framework that will address this, but it is not complete yet.  Hopefully at least an initial integration will be available in the next Hive release.
> 
> Alan.
> 
>> On May 2, 2014, at 5:36 AM, Aleksei Udatšnõi <a....@gmail.com> wrote:
>> 
>> Hello,
>> 
>> There is this old recommendation for optimizing Hive join to use the largest table last in the join.
>> http://archive.cloudera.com/cdh/3/hive/language_manual/joins.html
>> 
>> The same recommendation appears in Programming Hive book.
>> 
>> Is this recommendation still valid or newer version of Hive take care of such optimization automatically?
>> 
>> Best,
>> Aleksei
> 
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.

Re: largest table last in joins

Posted by Alan Gates <ga...@hortonworks.com>.
Join ordering is not yet part of the Hive optimizer.  There is integration work being done with the Optiq framework that will address this, but it is not complete yet.  Hopefully at least an initial integration will be available in the next Hive release.

Alan.

On May 2, 2014, at 5:36 AM, Aleksei Udatšnõi <a....@gmail.com> wrote:

> Hello,
> 
> There is this old recommendation for optimizing Hive join to use the largest table last in the join.
> http://archive.cloudera.com/cdh/3/hive/language_manual/joins.html
> 
> The same recommendation appears in Programming Hive book.
> 
> Is this recommendation still valid or newer version of Hive take care of such optimization automatically?
> 
> Best,
> Aleksei
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.