You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Harsha HN <99...@gmail.com> on 2015/04/16 08:38:53 UTC

Question on MAPJOIN Vs JOIN performance

Hi All,

I went through below mentioned Facebook engineering page,
https://www.facebook.com/notes/facebook-engineering/join
-optimization-in-apache-hive/470667928919

I set following for auto conversion of joins,
set hive.auto.convert.join=true;
set hive.mapjoin.smalltable.filesize=1000000000;    (1GB)

I observed some queries performed 2X faster in MAP JOIN as opposed to
Common join
and also instances where MAP JOIN is 3X slower than Common Join.

Any thoughts on what might be slowing down MAP JOIN in some cases ?

I have 40 Node cluster, so I have huge RAM available.

Thanks,
Harsha

Re: Question on MAPJOIN Vs JOIN performance

Posted by Harsha HN <99...@gmail.com>.
Hi,

Thanks for your reply. I will go through the link.
By the way my hive version is 0.12

Thanks,
Harsha

On Fri, Apr 17, 2015 at 4:16 AM, Lefty Leverenz <le...@gmail.com>
wrote:

> Harsha, that document is from 2010.  What version of Hive are you using?
>
> Here's some up-to-date information in the Hive wiki:  Join Optimimzation
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization>
> .
>
> -- Lefty
>
> On Thu, Apr 16, 2015 at 2:38 AM, Harsha HN <99...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I went through below mentioned Facebook engineering page,
>> https://www.facebook.com/notes/facebook-engineering/join
>> -optimization-in-apache-hive/470667928919
>>
>> I set following for auto conversion of joins,
>> set hive.auto.convert.join=true;
>> set hive.mapjoin.smalltable.filesize=1000000000;    (1GB)
>>
>> I observed some queries performed 2X faster in MAP JOIN as opposed to
>> Common join
>> and also instances where MAP JOIN is 3X slower than Common Join.
>>
>> Any thoughts on what might be slowing down MAP JOIN in some cases ?
>>
>> I have 40 Node cluster, so I have huge RAM available.
>>
>> Thanks,
>> Harsha
>>
>
>

Re: Question on MAPJOIN Vs JOIN performance

Posted by Lefty Leverenz <le...@gmail.com>.
Harsha, that document is from 2010.  What version of Hive are you using?

Here's some up-to-date information in the Hive wiki:  Join Optimimzation
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization>
.

-- Lefty

On Thu, Apr 16, 2015 at 2:38 AM, Harsha HN <99...@gmail.com> wrote:

> Hi All,
>
> I went through below mentioned Facebook engineering page,
> https://www.facebook.com/notes/facebook-engineering/join
> -optimization-in-apache-hive/470667928919
>
> I set following for auto conversion of joins,
> set hive.auto.convert.join=true;
> set hive.mapjoin.smalltable.filesize=1000000000;    (1GB)
>
> I observed some queries performed 2X faster in MAP JOIN as opposed to
> Common join
> and also instances where MAP JOIN is 3X slower than Common Join.
>
> Any thoughts on what might be slowing down MAP JOIN in some cases ?
>
> I have 40 Node cluster, so I have huge RAM available.
>
> Thanks,
> Harsha
>