You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sarah Sproehnle <sa...@cloudera.com> on 2010/05/01 02:25:10 UTC

mapjoin execution plan

Hi,

I am confused by the execution plan for a query.  First I did:
SELECT * FROM t1 JOIN t2 ON (t1.a = t2.a) ORDER BY t1.b;

As expected, EXPLAIN reported that there would be 2 MR stages (one for
the reduce-side join and one for the order by).

So I added a MAPJOIN(t1) hint and expected a single MR stage, but what
I got was (I think) a map-only job and a map-reduce job.  Is this
normal?

Explain plan: http://pastebin.com/pEyT22vC

Thanks,
Sarah
-- 
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training
blog: cloudera.com/blog
twitter: twitter.com/cloudera

RE: mapjoin execution plan

Posted by Namit Jain <nj...@facebook.com>.
Right now, mapjoin is not fully optimized - it is the expected behavior.
MapJoin writes the results in a temp file, and then the order by is processed of that file.


Thanks,
-namit


-----Original Message-----
From: Sarah Sproehnle [mailto:sarah@cloudera.com] 
Sent: Friday, April 30, 2010 5:25 PM
To: hive-user@hadoop.apache.org
Subject: mapjoin execution plan

Hi,

I am confused by the execution plan for a query.  First I did:
SELECT * FROM t1 JOIN t2 ON (t1.a = t2.a) ORDER BY t1.b;

As expected, EXPLAIN reported that there would be 2 MR stages (one for
the reduce-side join and one for the order by).

So I added a MAPJOIN(t1) hint and expected a single MR stage, but what
I got was (I think) a map-only job and a map-reduce job.  Is this
normal?

Explain plan: http://pastebin.com/pEyT22vC

Thanks,
Sarah
-- 
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training
blog: cloudera.com/blog
twitter: twitter.com/cloudera