You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Jay Hacker <ja...@gmail.com> on 2011/04/26 16:30:40 UTC

Limiting the maximum number of simultaneous jobs

I have a Pig script that sometimes submits two mapreduce jobs at once.
 This runs double the number of mappers and reducers that the cluster
is configured for, which leads to oversubscription and thrashing.
This may be more of a scheduler thing, but does anyone know how to
tell Hadoop to only run one job at a time?  Thanks.

Re: Limiting the maximum number of simultaneous jobs

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

If you are using the FairScheduler, you can set this in its config:
http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html

Relevant bits from that doc:

The allocation file configures minimum shares, running job limits, weights
and preemption timeouts for each pool. Only users/pools whose values differ
from the defaults need to be explicitly configured in this file. The
allocation file is located in *HADOOP_HOME/conf/fair-scheduler.xml*. It can
contain the following types of elements:

   - *pool* elements, which configure each pool. These may contain the
   following sub-elements:
      - *minMaps* and *minReduces*, to set the pool's minimum share of task
      slots.
      - *maxMaps* and *maxReduces*, to set the pool's maximum concurrent
      task slots.
      - *schedulingMode*, the pool's internal scheduling mode, which can be
      *fair* for fair sharing or *fifo* for first-in-first-out.
      - *maxRunningJobs*, to limit the number of jobs from the pool to run
      at once (defaults to infinite).


On Tue, Apr 26, 2011 at 7:30 AM, Jay Hacker <ja...@gmail.com> wrote:

> I have a Pig script that sometimes submits two mapreduce jobs at once.
>  This runs double the number of mappers and reducers that the cluster
> is configured for, which leads to oversubscription and thrashing.
> This may be more of a scheduler thing, but does anyone know how to
> tell Hadoop to only run one job at a time?  Thanks.
>