You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by edward choi <mp...@gmail.com> on 2012/01/18 07:57:07 UTC

Is it possible to set how many map slots to use on each job submission?

Hi,

I often run into situations like this:
I am running a very heavy job(let's say job 1) on a hadoop cluster(which
takes many hours). Then something comes up that needs to be done very
quickly(let's say job 2).
Job 2 only takes a couple of hours when executed on hadoop. But it will
take a couple ten hours if run on a single machine.
So I'd definitely want to use Hadoop for job 2. But since job 1 is already
running on Hadoop and hogging all the map slots, I can't run job 2 on
hadoop(it will only be queued).

So I was wondering:
Is there a way to set a specific number of map slots(or the number of slave
nodes) to use when submitting each job?
I read that setNumMapTasks() is not an enforcing configuration.
I would like to leave a couple of map slots free for occasions like above.

Ed

Re: Is it possible to set how many map slots to use on each job submission?

Posted by edward choi <mp...@gmail.com>.

Thanks for the tip Harsh, Arun.
What I was exactly looking for!!

Regards,
Ed

2012/1/18 Arun C Murthy <ac...@hortonworks.com>

> The CapacityScheduler provides exactly this. Setup 2 queues with
> appropriate capacities for each:
>
> http://hadoop.apache.org/common/docs/r1.0.0/capacity_scheduler.html
>
> Arun
>
> On Jan 17, 2012, at 10:57 PM, edward choi wrote:
>
> > Hi,
> >
> > I often run into situations like this:
> > I am running a very heavy job(let's say job 1) on a hadoop cluster(which
> > takes many hours). Then something comes up that needs to be done very
> > quickly(let's say job 2).
> > Job 2 only takes a couple of hours when executed on hadoop. But it will
> > take a couple ten hours if run on a single machine.
> > So I'd definitely want to use Hadoop for job 2. But since job 1 is
> already
> > running on Hadoop and hogging all the map slots, I can't run job 2 on
> > hadoop(it will only be queued).
> >
> > So I was wondering:
> > Is there a way to set a specific number of map slots(or the number of
> slave
> > nodes) to use when submitting each job?
> > I read that setNumMapTasks() is not an enforcing configuration.
> > I would like to leave a couple of map slots free for occasions like
> above.
> >
> > Ed
>
>

Re: Is it possible to set how many map slots to use on each job submission?

Posted by Arun C Murthy <ac...@hortonworks.com>.

The CapacityScheduler provides exactly this. Setup 2 queues with appropriate capacities for each:

http://hadoop.apache.org/common/docs/r1.0.0/capacity_scheduler.html

Arun

On Jan 17, 2012, at 10:57 PM, edward choi wrote:

> Hi,
> 
> I often run into situations like this:
> I am running a very heavy job(let's say job 1) on a hadoop cluster(which
> takes many hours). Then something comes up that needs to be done very
> quickly(let's say job 2).
> Job 2 only takes a couple of hours when executed on hadoop. But it will
> take a couple ten hours if run on a single machine.
> So I'd definitely want to use Hadoop for job 2. But since job 1 is already
> running on Hadoop and hogging all the map slots, I can't run job 2 on
> hadoop(it will only be queued).
> 
> So I was wondering:
> Is there a way to set a specific number of map slots(or the number of slave
> nodes) to use when submitting each job?
> I read that setNumMapTasks() is not an enforcing configuration.
> I would like to leave a couple of map slots free for occasions like above.
> 
> Ed

Re: Is it possible to set how many map slots to use on each job submission?

Posted by Harsh J <ha...@cloudera.com>.

Edward,

You need to invest in configuring a non-FIFO scheduler. FairScheduler may be what you are looking for. Take a look at http://hadoop.apache.org/common/docs/current/fair_scheduler.html for the docs.

On 18-Jan-2012, at 12:27 PM, edward choi wrote:

> Hi,
> 
> I often run into situations like this:
> I am running a very heavy job(let's say job 1) on a hadoop cluster(which
> takes many hours). Then something comes up that needs to be done very
> quickly(let's say job 2).
> Job 2 only takes a couple of hours when executed on hadoop. But it will
> take a couple ten hours if run on a single machine.
> So I'd definitely want to use Hadoop for job 2. But since job 1 is already
> running on Hadoop and hogging all the map slots, I can't run job 2 on
> hadoop(it will only be queued).
> 
> So I was wondering:
> Is there a way to set a specific number of map slots(or the number of slave
> nodes) to use when submitting each job?
> I read that setNumMapTasks() is not an enforcing configuration.
> I would like to leave a couple of map slots free for occasions like above.
> 
> Ed

--
Harsh J
Customer Ops. Engineer, Cloudera