You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Edward Capriolo <ed...@gmail.com> on 2010/03/08 21:57:01 UTC

Re: Concurrently running Hive queries -- mappers seem to be shared, but one job hogs all reducers!

On Mon, Mar 8, 2010 at 2:38 PM, Ryan LeCompte <le...@gmail.com> wrote:
> Hey guys,
>
> Here's a scenario:
>
> Cluster allows a max of 90 mappers and 90 reducers.
>
> 1) Submit a large job, which immediately utilizes all mappers and all
> reducers.
> 2) 10 minutes later, submit a second job. We notice that the cluster will
> eventually allow the mapper portion of both jobs to be shared (so they both
> run concurrently).
>
> HOWEVER... The first job hogs all of the reducers and never "lets go" of
> them so that the other query can have its reducers running.
>
> Any idea how to overcome this? Is there a way to tell Hive or Hadoop to "let
> go" of reducers that are currently running?
>
> Should I limit the max reducers that a single job can use? How?
>
> Thanks,
> Ryan
>
>

Ryan,

I think most of this is in hadoop configuration. You should be able to do:

set mapred.reduce.tasks=5;
query ;

Other switches tell hive how much data each reducer should handle.

We are using the fair share scheduler. From reading some Jira's. I do
not think hadoop supports true preemption yet.

I spoke with some Facebooker's at hadoop World NYC "got around" this
(and all problems)  by running multiple job trackers. Of course this
is a major architectural decision .

Edward