You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Aleksei Udatšnõi <a....@gmail.com> on 2014/05/02 14:36:04 UTC
largest table last in joins
Hello,
There is this old recommendation for optimizing Hive join to use the
largest table last in the join.
http://archive.cloudera.com/cdh/3/hive/language_manual/joins.html
The same recommendation appears in Programming Hive book.
Is this recommendation still valid or newer version of Hive take care of
such optimization automatically?
Best,
Aleksei
Re: largest table last in joins
Posted by Db-Blog <mp...@gmail.com>.
Hi,
If we have one big table joining with a small table and MAPJOIN hint is specified on the Smaller table, still the ordering will be required?
We can always forcefully set the auto convert join property to false and enable mapjoin hints.
Please let me know if I am off base on this topic.
Thanks,
Saurabh
Sent from my iPhone, please avoid typos.
> On 05-May-2014, at 9:19 pm, Alan Gates <ga...@hortonworks.com> wrote:
>
> Join ordering is not yet part of the Hive optimizer. There is integration work being done with the Optiq framework that will address this, but it is not complete yet. Hopefully at least an initial integration will be available in the next Hive release.
>
> Alan.
>
>> On May 2, 2014, at 5:36 AM, Aleksei Udatšnõi <a....@gmail.com> wrote:
>>
>> Hello,
>>
>> There is this old recommendation for optimizing Hive join to use the largest table last in the join.
>> http://archive.cloudera.com/cdh/3/hive/language_manual/joins.html
>>
>> The same recommendation appears in Programming Hive book.
>>
>> Is this recommendation still valid or newer version of Hive take care of such optimization automatically?
>>
>> Best,
>> Aleksei
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
Re: largest table last in joins
Posted by Alan Gates <ga...@hortonworks.com>.
Join ordering is not yet part of the Hive optimizer. There is integration work being done with the Optiq framework that will address this, but it is not complete yet. Hopefully at least an initial integration will be available in the next Hive release.
Alan.
On May 2, 2014, at 5:36 AM, Aleksei Udatšnõi <a....@gmail.com> wrote:
> Hello,
>
> There is this old recommendation for optimizing Hive join to use the largest table last in the join.
> http://archive.cloudera.com/cdh/3/hive/language_manual/joins.html
>
> The same recommendation appears in Programming Hive book.
>
> Is this recommendation still valid or newer version of Hive take care of such optimization automatically?
>
> Best,
> Aleksei
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.