You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Tim Kaldewey <tk...@us.ibm.com> on 2011/03/24 22:29:54 UTC

Number of map reduce jobs generated


Hello,

I noticed that for join queries that comprise an aggregate hive generates a
query plan with two MR jobs, one does the join and the second the
aggregate. I was wondering if there is a way to hint hive to combine these
two operations in 1 MR job. I have attached an example of the set of
queries I am looking at.

Thanks

Tim



select /*+ MAPJOIN(Table2) */ sum(t1_10 * t1_12)
  from Table1 join Table2 on (Table1.t1_6 = Table2.t2_1)
  where Table2.t2_5 = 1234
    and 8 <= Table1.t1_12 <= 10
    and Table1.t1_9 < 42;

to explain:
- table 2 is small, thus I choose a map-side (broadcast) join.
- when I remove the aggregate hive only generates 1MR job

RE: Number of map reduce jobs generated

Posted by Tim Kaldewey <tk...@us.ibm.com>.

That is what I was wondering. I would expect the join and the aggregate to
be combined in 1 job much faster as it would save storing results and
reading them back in ... Is there a way (hint) to tell hive to do so ?

From:	Sudhish Iyer <su...@yahoo-inc.com>
To:	"user@hive.apache.org" <us...@hive.apache.org>
Date:	03/24/2011 05:02 PM
Subject:	RE: Number of map reduce jobs generated

In this case would 2 M/R job run faster than one?

From: Tim Kaldewey [mailto:tkaldew@us.ibm.com]
Sent: Thursday, March 24, 2011 2:30 PM
To: user@hive.apache.org
Subject: Number of map reduce jobs generated

Hello,

I noticed that for join queries that comprise an aggregate hive generates a
query plan with two MR jobs, one does the join and the second the
aggregate. I was wondering if there is a way to hint hive to combine these
two operations in 1 MR job. I have attached an example of the set of
queries I am looking at.

Thanks

Tim

select /*+ MAPJOIN(Table2) */ sum(t1_10 * t1_12)
from Table1 join Table2 on (Table1.t1_6 = Table2.t2_1)
where Table2.t2_5 = 1234
and 8 <= Table1.t1_12 <= 10
and Table1.t1_9 < 42;

to explain:
- table 2 is small, thus I choose a map-side (broadcast) join.
- when I remove the aggregate hive only generates 1MR job

RE: Number of map reduce jobs generated

Posted by Sudhish Iyer <su...@yahoo-inc.com>.

In this case would 2 M/R job run faster than one?

From: Tim Kaldewey [mailto:tkaldew@us.ibm.com]
Sent: Thursday, March 24, 2011 2:30 PM
To: user@hive.apache.org
Subject: Number of map reduce jobs generated

Hello,

I noticed that for join queries that comprise an aggregate hive generates a query plan with two MR jobs, one does the join and the second the aggregate. I was wondering if there is a way to hint hive to combine these two operations in 1 MR job. I have attached an example of the set of queries I am looking at.

Thanks

Tim

select /*+ MAPJOIN(Table2) */ sum(t1_10 * t1_12)
from Table1 join Table2 on (Table1.t1_6 = Table2.t2_1)
where Table2.t2_5 = 1234
and 8 <= Table1.t1_12 <= 10
and Table1.t1_9 < 42;

to explain:
- table 2 is small, thus I choose a map-side (broadcast) join.
- when I remove the aggregate hive only generates 1MR job