You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by hadoop n00b <ne...@gmail.com> on 2011/01/20 08:00:50 UTC

Mapjoin Usage Question

Hi,

How do I use the mapjoin hint in a query.

Say, I have two tables t1 and t2 where t2 is the smaller table. Do I specify
t2 in the mapjoin hint?

select /*+ mapjoin(b)*/ * from t1 join t2 b on (a.id = b.id)

If I am joining two smaller tables, can I specify two clauses in the
mapjoin? /*+mapjoin(b,c)*/?

I am unable to find much documentation on this. I am using CDH2 with Hive
0.4.1

Thanks!

Re: Mapjoin Usage Question

Posted by Ajo Fod <aj...@gmail.com>.
It probably depends on how big the big table is ... I mean if it can
be held in memory.

-Ajo

On Wed, Jan 19, 2011 at 11:23 PM, hadoop n00b <ne...@gmail.com> wrote:
> Thanks Leo,
>
> Does the smaller table go into the mapjoin hint? Actually, when I ran a test
> query with the bigger table in the hint, it performed better.
>
> On Thu, Jan 20, 2011 at 12:40 PM, Leo Alekseyev <dn...@gmail.com> wrote:
>>
>> You can only specify one table, and make sure to include its name,
>> i.e. /*+ mapjoin(t2)*/.   For more info see
>> http://wiki.apache.org/hadoop/Hive/JoinOptimization and
>> http://www.slideshare.net/aiolos127/join-optimization-in-hive.
>>
>> Also, you are using a relatively old version of Hive, but I'll let
>> more experienced people on this list decide whether that's a problem
>> :)
>>
>> On Thu, Jan 20, 2011 at 2:00 AM, hadoop n00b <ne...@gmail.com> wrote:
>> > Hi,
>> >
>> > How do I use the mapjoin hint in a query.
>> >
>> > Say, I have two tables t1 and t2 where t2 is the smaller table. Do I
>> > specify
>> > t2 in the mapjoin hint?
>> >
>> > select /*+ mapjoin(b)*/ * from t1 join t2 b on (a.id = b.id)
>> >
>> > If I am joining two smaller tables, can I specify two clauses in the
>> > mapjoin? /*+mapjoin(b,c)*/?
>> >
>> > I am unable to find much documentation on this. I am using CDH2 with
>> > Hive
>> > 0.4.1
>> >
>> > Thanks!
>
>

Re: Mapjoin Usage Question

Posted by hadoop n00b <ne...@gmail.com>.
Thanks Leo,

Does the smaller table go into the mapjoin hint? Actually, when I ran a test
query with the bigger table in the hint, it performed better.

On Thu, Jan 20, 2011 at 12:40 PM, Leo Alekseyev <dn...@gmail.com> wrote:

> You can only specify one table, and make sure to include its name,
> i.e. /*+ mapjoin(t2)*/.   For more info see
> http://wiki.apache.org/hadoop/Hive/JoinOptimization and
> http://www.slideshare.net/aiolos127/join-optimization-in-hive.
>
> Also, you are using a relatively old version of Hive, but I'll let
> more experienced people on this list decide whether that's a problem
> :)
>
> On Thu, Jan 20, 2011 at 2:00 AM, hadoop n00b <ne...@gmail.com> wrote:
> > Hi,
> >
> > How do I use the mapjoin hint in a query.
> >
> > Say, I have two tables t1 and t2 where t2 is the smaller table. Do I
> specify
> > t2 in the mapjoin hint?
> >
> > select /*+ mapjoin(b)*/ * from t1 join t2 b on (a.id = b.id)
> >
> > If I am joining two smaller tables, can I specify two clauses in the
> > mapjoin? /*+mapjoin(b,c)*/?
> >
> > I am unable to find much documentation on this. I am using CDH2 with Hive
> > 0.4.1
> >
> > Thanks!
>

Re: Mapjoin Usage Question

Posted by Leo Alekseyev <dn...@gmail.com>.
You can only specify one table, and make sure to include its name,
i.e. /*+ mapjoin(t2)*/.   For more info see
http://wiki.apache.org/hadoop/Hive/JoinOptimization and
http://www.slideshare.net/aiolos127/join-optimization-in-hive.

Also, you are using a relatively old version of Hive, but I'll let
more experienced people on this list decide whether that's a problem
:)

On Thu, Jan 20, 2011 at 2:00 AM, hadoop n00b <ne...@gmail.com> wrote:
> Hi,
>
> How do I use the mapjoin hint in a query.
>
> Say, I have two tables t1 and t2 where t2 is the smaller table. Do I specify
> t2 in the mapjoin hint?
>
> select /*+ mapjoin(b)*/ * from t1 join t2 b on (a.id = b.id)
>
> If I am joining two smaller tables, can I specify two clauses in the
> mapjoin? /*+mapjoin(b,c)*/?
>
> I am unable to find much documentation on this. I am using CDH2 with Hive
> 0.4.1
>
> Thanks!