You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ryan LeCompte <le...@gmail.com> on 2010/03/08 20:38:53 UTC

Concurrently running Hive queries -- mappers seem to be shared, but one job hogs all reducers!

Hey guys,

Here's a scenario:

Cluster allows a max of 90 mappers and 90 reducers.

1) Submit a large job, which immediately utilizes all mappers and all
reducers.
2) 10 minutes later, submit a second job. We notice that the cluster will
eventually allow the mapper portion of both jobs to be shared (so they both
run concurrently).

HOWEVER... The first job hogs all of the reducers and never "lets go" of
them so that the other query can have its reducers running.

Any idea how to overcome this? Is there a way to tell Hive or Hadoop to "let
go" of reducers that are currently running?

Should I limit the max reducers that a single job can use? How?

Thanks,
Ryan

Re: Concurrently running Hive queries -- mappers seem to be shared, but one job hogs all reducers!

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Mar 8, 2010 at 2:38 PM, Ryan LeCompte <le...@gmail.com> wrote:
> Hey guys,
>
> Here's a scenario:
>
> Cluster allows a max of 90 mappers and 90 reducers.
>
> 1) Submit a large job, which immediately utilizes all mappers and all
> reducers.
> 2) 10 minutes later, submit a second job. We notice that the cluster will
> eventually allow the mapper portion of both jobs to be shared (so they both
> run concurrently).
>
> HOWEVER... The first job hogs all of the reducers and never "lets go" of
> them so that the other query can have its reducers running.
>
> Any idea how to overcome this? Is there a way to tell Hive or Hadoop to "let
> go" of reducers that are currently running?
>
> Should I limit the max reducers that a single job can use? How?
>
> Thanks,
> Ryan
>
>

Ryan,

I think most of this is in hadoop configuration. You should be able to do:

set mapred.reduce.tasks=5;
query ;

Other switches tell hive how much data each reducer should handle.

We are using the fair share scheduler. From reading some Jira's. I do
not think hadoop supports true preemption yet.

I spoke with some Facebooker's at hadoop World NYC "got around" this
(and all problems)  by running multiple job trackers. Of course this
is a major architectural decision .

Edward