You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Renaud Delbru <re...@deri.org> on 2011/01/25 12:49:44 UTC

Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

Hi,

we would like to limit the number of maximum tasks per job on our hadoop 
0.20.2 cluster.
Is the Capacity Scheduler [1] will allow to do this ? Is it correctly 
working on hadoop 0.20.2 (I remember a  few months ago, we were looking 
at it, but it seemed incompatible with hadoop 0.20.2).

[1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html

Regards,
-- 
Renaud Delbru

Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

Posted by Renaud Delbru <re...@deri.org>.
Hi Allen,

thanks for pointing this out.

On 28/01/11 17:34, Allen Wittenauer wrote:
>> As it seems that the capacity and fair schedulers in hadoop 0.20.2 do not allow a hard upper limit in number of concurrent tasks, do anybody know any other solutions to achieve this ?
> The specific change for capacity scheduler has been backported to 0.20.2 as part of https://issues.apache.org/jira/browse/MAPREDUCE-1105 .  Note that you'll also need https://issues.apache.org/jira/browse/MAPREDUCE-1160 which fixes a logging bug in the JobTracker.  Otherwise your logs will fill up.
-- 
Renaud Delbru

Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

Posted by Allen Wittenauer <aw...@linkedin.com>.
On Jan 25, 2011, at 12:48 PM, Renaud Delbru wrote:

> As it seems that the capacity and fair schedulers in hadoop 0.20.2 do not allow a hard upper limit in number of concurrent tasks, do anybody know any other solutions to achieve this ?

The specific change for capacity scheduler has been backported to 0.20.2 as part of https://issues.apache.org/jira/browse/MAPREDUCE-1105 .  Note that you'll also need https://issues.apache.org/jira/browse/MAPREDUCE-1160 which fixes a logging bug in the JobTracker.  Otherwise your logs will fill up.


Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

Posted by Renaud Delbru <re...@deri.org>.
Thanks, we will try to test it next week.
-- 
Renaud Delbru

On 27/01/11 11:31, Steve Loughran wrote:
> On 27/01/11 10:51, Renaud Delbru wrote:
>> Hi Koji,
>>
>> thanks for sharing the information,
>> Is the 0.20-security branch planned to be a official release at some
>> point ?
>>
>> Cheers
>
> If you can play with the beta you can see that it works for you and if 
> not, get bugs fixed during the beta cycle
>
> http://people.apache.org/~acmurthy/hadoop-0.20.100-rc0/


Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

Posted by Steve Loughran <st...@apache.org>.
On 27/01/11 10:51, Renaud Delbru wrote:
> Hi Koji,
>
> thanks for sharing the information,
> Is the 0.20-security branch planned to be a official release at some
> point ?
>
> Cheers

If you can play with the beta you can see that it works for you and if 
not, get bugs fixed during the beta cycle

http://people.apache.org/~acmurthy/hadoop-0.20.100-rc0/

Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

Posted by Renaud Delbru <re...@deri.org>.
Hi Koji,

thanks for sharing the information,
Is the 0.20-security branch planned to be a official release at some point ?

Cheers
-- 
Renaud Delbru

On 27/01/11 01:50, Koji Noguchi wrote:
> Hi Renaud,
>
> Hopefully it’ll be in 0.20-security branch that Arun is trying to push.
>
> Related (very abstract) Jira.
> https://issues.apache.org/jira/browse/MAPREDUCE-1872
>
> Koji
>
>
>
> On 1/25/11 12:48 PM, "Renaud Delbru" <re...@deri.org> wrote:
>
>     As it seems that the capacity and fair schedulers in hadoop 0.20.2 do
>     not allow a hard upper limit in number of concurrent tasks, do anybody
>     know any other solutions to achieve this ?
>     --
>     Renaud Delbru
>
>     On 25/01/11 11:49, Renaud Delbru wrote:
>     > Hi,
>     >
>     > we would like to limit the number of maximum tasks per job on our
>     > hadoop 0.20.2 cluster.
>     > Is the Capacity Scheduler [1] will allow to do this ? Is it correctly
>     > working on hadoop 0.20.2 (I remember a few months ago, we were
>     > looking at it, but it seemed incompatible with hadoop 0.20.2).
>     >
>     > [1]
>     http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
>     >
>     > Regards,
>
>


Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

Posted by Koji Noguchi <kn...@yahoo-inc.com>.
Hi Renaud,

Hopefully it'll be in 0.20-security branch that Arun is trying to push.

Related (very abstract) Jira.
https://issues.apache.org/jira/browse/MAPREDUCE-1872

Koji



On 1/25/11 12:48 PM, "Renaud Delbru" <re...@deri.org> wrote:

As it seems that the capacity and fair schedulers in hadoop 0.20.2 do
not allow a hard upper limit in number of concurrent tasks, do anybody
know any other solutions to achieve this ?
--
Renaud Delbru

On 25/01/11 11:49, Renaud Delbru wrote:
> Hi,
>
> we would like to limit the number of maximum tasks per job on our
> hadoop 0.20.2 cluster.
> Is the Capacity Scheduler [1] will allow to do this ? Is it correctly
> working on hadoop 0.20.2 (I remember a  few months ago, we were
> looking at it, but it seemed incompatible with hadoop 0.20.2).
>
> [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
>
> Regards,



Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

Posted by Renaud Delbru <re...@deri.org>.
As it seems that the capacity and fair schedulers in hadoop 0.20.2 do 
not allow a hard upper limit in number of concurrent tasks, do anybody 
know any other solutions to achieve this ?
-- 
Renaud Delbru

On 25/01/11 11:49, Renaud Delbru wrote:
> Hi,
>
> we would like to limit the number of maximum tasks per job on our 
> hadoop 0.20.2 cluster.
> Is the Capacity Scheduler [1] will allow to do this ? Is it correctly 
> working on hadoop 0.20.2 (I remember a  few months ago, we were 
> looking at it, but it seemed incompatible with hadoop 0.20.2).
>
> [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
>
> Regards,


Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

Posted by Harsh J <qw...@gmail.com>.
No, that is right. I did not assume that it was a very strict slot
limit you were looking to impose for your jobs.

On Tue, Jan 25, 2011 at 9:27 PM, Renaud Delbru <re...@deri.org> wrote:
> Our experience with the Capacity Scheduler was not what we expected and what
> you describe. But, it might be due to a wrong comprehension of the
> configuration parameters.
> The problem is the following:
> mapred.capacity-scheduler.queue.<queue-name>.capacity: Percentage of the
> number of slots in the cluster that are *guaranteed* to be available for
> jobs in this queue.
> mapred.capacity-scheduler.queue.<queue-name>.minimum-user-limit-percent:
> Each queue enforces a limit on the percentage of resources allocated to a
> user at any given time, if *there is competition for them*.
>
> So, in fact, it seems that if there is no competition, and that the cluster
> is fully available, the scheduler will assign the full cluster to the job,
> and will not limit the number of concurrent tasks. It seemed to us that the
> only way to enforce a strong limit was to use the Fair Scheduler of hadoop
> 0.21.0 which includes a new configuration parameters 'maxMaps'.
>
> Am I right, or did we miss something ?
>
> cheers
> --
> Renaud Delbru
>
> On 25/01/11 15:20, Harsh J wrote:
>>
>> Capacity Scheduler (or a version of it) does ship with the 0.20
>> release of Hadoop and is usable. It can be used to assign queues with
>> a limited capacity for each, which your jobs must appropriately submit
>> to if you want them to utilize only the assigned fraction of your
>> cluster for its processing.
>>
>> On Tue, Jan 25, 2011 at 5:19 PM, Renaud Delbru<re...@deri.org>
>>  wrote:
>>>
>>> Hi,
>>>
>>> we would like to limit the number of maximum tasks per job on our hadoop
>>> 0.20.2 cluster.
>>> Is the Capacity Scheduler [1] will allow to do this ? Is it correctly
>>> working on hadoop 0.20.2 (I remember a  few months ago, we were looking
>>> at
>>> it, but it seemed incompatible with hadoop 0.20.2).
>>>
>>> [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
>>>
>>> Regards,
>>> --
>>> Renaud Delbru
>>>
>>
>>
>
>



-- 
Harsh J
www.harshj.com

Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

Posted by Renaud Delbru <re...@deri.org>.
Our experience with the Capacity Scheduler was not what we expected and 
what you describe. But, it might be due to a wrong comprehension of the 
configuration parameters.
The problem is the following:
mapred.capacity-scheduler.queue.<queue-name>.capacity: Percentage of the 
number of slots in the cluster that are *guaranteed* to be available for 
jobs in this queue.
mapred.capacity-scheduler.queue.<queue-name>.minimum-user-limit-percent: 
Each queue enforces a limit on the percentage of resources allocated to 
a user at any given time, if *there is competition for them*.

So, in fact, it seems that if there is no competition, and that the 
cluster is fully available, the scheduler will assign the full cluster 
to the job, and will not limit the number of concurrent tasks. It seemed 
to us that the only way to enforce a strong limit was to use the Fair 
Scheduler of hadoop 0.21.0 which includes a new configuration parameters 
'maxMaps'.

Am I right, or did we miss something ?

cheers
-- 
Renaud Delbru

On 25/01/11 15:20, Harsh J wrote:
> Capacity Scheduler (or a version of it) does ship with the 0.20
> release of Hadoop and is usable. It can be used to assign queues with
> a limited capacity for each, which your jobs must appropriately submit
> to if you want them to utilize only the assigned fraction of your
> cluster for its processing.
>
> On Tue, Jan 25, 2011 at 5:19 PM, Renaud Delbru<re...@deri.org>  wrote:
>> Hi,
>>
>> we would like to limit the number of maximum tasks per job on our hadoop
>> 0.20.2 cluster.
>> Is the Capacity Scheduler [1] will allow to do this ? Is it correctly
>> working on hadoop 0.20.2 (I remember a  few months ago, we were looking at
>> it, but it seemed incompatible with hadoop 0.20.2).
>>
>> [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
>>
>> Regards,
>> --
>> Renaud Delbru
>>
>
>


Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

Posted by Harsh J <qw...@gmail.com>.
Capacity Scheduler (or a version of it) does ship with the 0.20
release of Hadoop and is usable. It can be used to assign queues with
a limited capacity for each, which your jobs must appropriately submit
to if you want them to utilize only the assigned fraction of your
cluster for its processing.

On Tue, Jan 25, 2011 at 5:19 PM, Renaud Delbru <re...@deri.org> wrote:
> Hi,
>
> we would like to limit the number of maximum tasks per job on our hadoop
> 0.20.2 cluster.
> Is the Capacity Scheduler [1] will allow to do this ? Is it correctly
> working on hadoop 0.20.2 (I remember a  few months ago, we were looking at
> it, but it seemed incompatible with hadoop 0.20.2).
>
> [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
>
> Regards,
> --
> Renaud Delbru
>



-- 
Harsh J
www.harshj.com