You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Hans-Peter Zorn <hp...@gmail.com> on 2013/07/30 11:33:59 UTC

Capacity Scheduler on YARN

Hi,

I would like to use the capacity scheduler to implement the following
requirements.

3 groups of users:
* 2 of them are entitled to 50% of cluster capacity each but may use excess
capacity if available
* the third group only may use capacity that is currently not used by any
of the other two.

I set up three queues:
yarn.scheduler.capacity.root.a.capacity 50
yarn.scheduler.capacity.root.a.maximum-capacity 100

yarn.scheduler.capacity.root.b.capacity 49
yarn.scheduler.capacity.root.b.maximum-capacity 100

yarn.scheduler.capacity.root.c.capacity 1
yarn.scheduler.capacity.root.c.maximum-capacity 100

These values are active, as they show up in the scheduler web gui.

However, users of each queue are only able to use the assigned capacity,
the value
of maximum-capacity has no effect.

Did I misunderstand the concept behind the maximum-capacity setting?

I am using hadoop as shipped with Cloudera CDH4.3  (2.0.0-cdh4.3.0). I
asked this question already on cdh-users, but the capacity scheduler seems
not to be used so much with cdh I guess.

Or is it possible to implement this using a different scheduler?

Best & thanks,
Hans-Peter

Re: Capacity Scheduler on YARN

Posted by Hans-Peter Zorn <zo...@algo.informatik.tu-darmstadt.de>.
Thank you, Arun.

Indeed it is in the documentation. I just didn't figure out that those two
parameter were connected to each other. Works like it is supposed to.

Best,
Hans-Peter


On Tue, Jul 30, 2013 at 1:52 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Hans-Peter,
>
>  The CS, by default, will cap a single user at the queue's capacity (not
> max-capacity) - this is why you are seeing that any given user cannot go
> beyond the queue's capacity… in other words, you will need more than one
> user to utilize max-capacity of the queue.
>
>  If you want to allow this behavior, you can set user-limit-factor (a
> multiple of queue-capacity), see
> yarn.scheduler.capacity.<queue-path>.user-limit-factor in:
>
> http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>
> hth,
> Arun
>
> On Jul 30, 2013, at 2:33 AM, Hans-Peter Zorn <hp...@gmail.com> wrote:
>
> Hi,
>
> I would like to use the capacity scheduler to implement the following
> requirements.
>
> 3 groups of users:
> * 2 of them are entitled to 50% of cluster capacity each but may use
> excess capacity if available
> * the third group only may use capacity that is currently not used by any
> of the other two.
>
> I set up three queues:
> yarn.scheduler.capacity.root.a.capacity 50
> yarn.scheduler.capacity.root.a.maximum-capacity 100
>
> yarn.scheduler.capacity.root.b.capacity 49
> yarn.scheduler.capacity.root.b.maximum-capacity 100
>
> yarn.scheduler.capacity.root.c.capacity 1
> yarn.scheduler.capacity.root.c.maximum-capacity 100
>
> These values are active, as they show up in the scheduler web gui.
>
> However, users of each queue are only able to use the assigned capacity,
> the value
> of maximum-capacity has no effect.
>
> Did I misunderstand the concept behind the maximum-capacity setting?
>
> I am using hadoop as shipped with Cloudera CDH4.3  (2.0.0-cdh4.3.0). I
> asked this question already on cdh-users, but the capacity scheduler seems
> not to be used so much with cdh I guess.
>
> Or is it possible to implement this using a different scheduler?
>
> Best & thanks,
> Hans-Peter
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Capacity Scheduler on YARN

Posted by Hans-Peter Zorn <zo...@algo.informatik.tu-darmstadt.de>.
Thank you, Arun.

Indeed it is in the documentation. I just didn't figure out that those two
parameter were connected to each other. Works like it is supposed to.

Best,
Hans-Peter


On Tue, Jul 30, 2013 at 1:52 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Hans-Peter,
>
>  The CS, by default, will cap a single user at the queue's capacity (not
> max-capacity) - this is why you are seeing that any given user cannot go
> beyond the queue's capacity… in other words, you will need more than one
> user to utilize max-capacity of the queue.
>
>  If you want to allow this behavior, you can set user-limit-factor (a
> multiple of queue-capacity), see
> yarn.scheduler.capacity.<queue-path>.user-limit-factor in:
>
> http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>
> hth,
> Arun
>
> On Jul 30, 2013, at 2:33 AM, Hans-Peter Zorn <hp...@gmail.com> wrote:
>
> Hi,
>
> I would like to use the capacity scheduler to implement the following
> requirements.
>
> 3 groups of users:
> * 2 of them are entitled to 50% of cluster capacity each but may use
> excess capacity if available
> * the third group only may use capacity that is currently not used by any
> of the other two.
>
> I set up three queues:
> yarn.scheduler.capacity.root.a.capacity 50
> yarn.scheduler.capacity.root.a.maximum-capacity 100
>
> yarn.scheduler.capacity.root.b.capacity 49
> yarn.scheduler.capacity.root.b.maximum-capacity 100
>
> yarn.scheduler.capacity.root.c.capacity 1
> yarn.scheduler.capacity.root.c.maximum-capacity 100
>
> These values are active, as they show up in the scheduler web gui.
>
> However, users of each queue are only able to use the assigned capacity,
> the value
> of maximum-capacity has no effect.
>
> Did I misunderstand the concept behind the maximum-capacity setting?
>
> I am using hadoop as shipped with Cloudera CDH4.3  (2.0.0-cdh4.3.0). I
> asked this question already on cdh-users, but the capacity scheduler seems
> not to be used so much with cdh I guess.
>
> Or is it possible to implement this using a different scheduler?
>
> Best & thanks,
> Hans-Peter
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Capacity Scheduler on YARN

Posted by Hans-Peter Zorn <zo...@algo.informatik.tu-darmstadt.de>.
Thank you, Arun.

Indeed it is in the documentation. I just didn't figure out that those two
parameter were connected to each other. Works like it is supposed to.

Best,
Hans-Peter


On Tue, Jul 30, 2013 at 1:52 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Hans-Peter,
>
>  The CS, by default, will cap a single user at the queue's capacity (not
> max-capacity) - this is why you are seeing that any given user cannot go
> beyond the queue's capacity… in other words, you will need more than one
> user to utilize max-capacity of the queue.
>
>  If you want to allow this behavior, you can set user-limit-factor (a
> multiple of queue-capacity), see
> yarn.scheduler.capacity.<queue-path>.user-limit-factor in:
>
> http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>
> hth,
> Arun
>
> On Jul 30, 2013, at 2:33 AM, Hans-Peter Zorn <hp...@gmail.com> wrote:
>
> Hi,
>
> I would like to use the capacity scheduler to implement the following
> requirements.
>
> 3 groups of users:
> * 2 of them are entitled to 50% of cluster capacity each but may use
> excess capacity if available
> * the third group only may use capacity that is currently not used by any
> of the other two.
>
> I set up three queues:
> yarn.scheduler.capacity.root.a.capacity 50
> yarn.scheduler.capacity.root.a.maximum-capacity 100
>
> yarn.scheduler.capacity.root.b.capacity 49
> yarn.scheduler.capacity.root.b.maximum-capacity 100
>
> yarn.scheduler.capacity.root.c.capacity 1
> yarn.scheduler.capacity.root.c.maximum-capacity 100
>
> These values are active, as they show up in the scheduler web gui.
>
> However, users of each queue are only able to use the assigned capacity,
> the value
> of maximum-capacity has no effect.
>
> Did I misunderstand the concept behind the maximum-capacity setting?
>
> I am using hadoop as shipped with Cloudera CDH4.3  (2.0.0-cdh4.3.0). I
> asked this question already on cdh-users, but the capacity scheduler seems
> not to be used so much with cdh I guess.
>
> Or is it possible to implement this using a different scheduler?
>
> Best & thanks,
> Hans-Peter
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Capacity Scheduler on YARN

Posted by Hans-Peter Zorn <zo...@algo.informatik.tu-darmstadt.de>.
Thank you, Arun.

Indeed it is in the documentation. I just didn't figure out that those two
parameter were connected to each other. Works like it is supposed to.

Best,
Hans-Peter


On Tue, Jul 30, 2013 at 1:52 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Hans-Peter,
>
>  The CS, by default, will cap a single user at the queue's capacity (not
> max-capacity) - this is why you are seeing that any given user cannot go
> beyond the queue's capacity… in other words, you will need more than one
> user to utilize max-capacity of the queue.
>
>  If you want to allow this behavior, you can set user-limit-factor (a
> multiple of queue-capacity), see
> yarn.scheduler.capacity.<queue-path>.user-limit-factor in:
>
> http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>
> hth,
> Arun
>
> On Jul 30, 2013, at 2:33 AM, Hans-Peter Zorn <hp...@gmail.com> wrote:
>
> Hi,
>
> I would like to use the capacity scheduler to implement the following
> requirements.
>
> 3 groups of users:
> * 2 of them are entitled to 50% of cluster capacity each but may use
> excess capacity if available
> * the third group only may use capacity that is currently not used by any
> of the other two.
>
> I set up three queues:
> yarn.scheduler.capacity.root.a.capacity 50
> yarn.scheduler.capacity.root.a.maximum-capacity 100
>
> yarn.scheduler.capacity.root.b.capacity 49
> yarn.scheduler.capacity.root.b.maximum-capacity 100
>
> yarn.scheduler.capacity.root.c.capacity 1
> yarn.scheduler.capacity.root.c.maximum-capacity 100
>
> These values are active, as they show up in the scheduler web gui.
>
> However, users of each queue are only able to use the assigned capacity,
> the value
> of maximum-capacity has no effect.
>
> Did I misunderstand the concept behind the maximum-capacity setting?
>
> I am using hadoop as shipped with Cloudera CDH4.3  (2.0.0-cdh4.3.0). I
> asked this question already on cdh-users, but the capacity scheduler seems
> not to be used so much with cdh I guess.
>
> Or is it possible to implement this using a different scheduler?
>
> Best & thanks,
> Hans-Peter
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Capacity Scheduler on YARN

Posted by Arun C Murthy <ac...@hortonworks.com>.
Hans-Peter,

 The CS, by default, will cap a single user at the queue's capacity (not max-capacity) - this is why you are seeing that any given user cannot go beyond the queue's capacity… in other words, you will need more than one user to utilize max-capacity of the queue.

 If you want to allow this behavior, you can set user-limit-factor (a multiple of queue-capacity), see yarn.scheduler.capacity.<queue-path>.user-limit-factor in:
 http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

hth,
Arun

On Jul 30, 2013, at 2:33 AM, Hans-Peter Zorn <hp...@gmail.com> wrote:

> Hi,
> 
> I would like to use the capacity scheduler to implement the following requirements.
> 
> 3 groups of users:
> * 2 of them are entitled to 50% of cluster capacity each but may use excess capacity if available
> * the third group only may use capacity that is currently not used by any of the other two.
> 
> I set up three queues:
> yarn.scheduler.capacity.root.a.capacity 50
> yarn.scheduler.capacity.root.a.maximum-capacity 100
> 
> yarn.scheduler.capacity.root.b.capacity 49
> yarn.scheduler.capacity.root.b.maximum-capacity 100
> 
> yarn.scheduler.capacity.root.c.capacity 1
> yarn.scheduler.capacity.root.c.maximum-capacity 100
> 
> These values are active, as they show up in the scheduler web gui.
> 
> However, users of each queue are only able to use the assigned capacity, the value
> of maximum-capacity has no effect.
> 
> Did I misunderstand the concept behind the maximum-capacity setting?
> 
> I am using hadoop as shipped with Cloudera CDH4.3  (2.0.0-cdh4.3.0). I asked this question already on cdh-users, but the capacity scheduler seems not to be used so much with cdh I guess.
> 
> Or is it possible to implement this using a different scheduler?
> 
> Best & thanks,
> Hans-Peter

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Capacity Scheduler on YARN

Posted by Arun C Murthy <ac...@hortonworks.com>.
Hans-Peter,

 The CS, by default, will cap a single user at the queue's capacity (not max-capacity) - this is why you are seeing that any given user cannot go beyond the queue's capacity… in other words, you will need more than one user to utilize max-capacity of the queue.

 If you want to allow this behavior, you can set user-limit-factor (a multiple of queue-capacity), see yarn.scheduler.capacity.<queue-path>.user-limit-factor in:
 http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

hth,
Arun

On Jul 30, 2013, at 2:33 AM, Hans-Peter Zorn <hp...@gmail.com> wrote:

> Hi,
> 
> I would like to use the capacity scheduler to implement the following requirements.
> 
> 3 groups of users:
> * 2 of them are entitled to 50% of cluster capacity each but may use excess capacity if available
> * the third group only may use capacity that is currently not used by any of the other two.
> 
> I set up three queues:
> yarn.scheduler.capacity.root.a.capacity 50
> yarn.scheduler.capacity.root.a.maximum-capacity 100
> 
> yarn.scheduler.capacity.root.b.capacity 49
> yarn.scheduler.capacity.root.b.maximum-capacity 100
> 
> yarn.scheduler.capacity.root.c.capacity 1
> yarn.scheduler.capacity.root.c.maximum-capacity 100
> 
> These values are active, as they show up in the scheduler web gui.
> 
> However, users of each queue are only able to use the assigned capacity, the value
> of maximum-capacity has no effect.
> 
> Did I misunderstand the concept behind the maximum-capacity setting?
> 
> I am using hadoop as shipped with Cloudera CDH4.3  (2.0.0-cdh4.3.0). I asked this question already on cdh-users, but the capacity scheduler seems not to be used so much with cdh I guess.
> 
> Or is it possible to implement this using a different scheduler?
> 
> Best & thanks,
> Hans-Peter

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Capacity Scheduler on YARN

Posted by Arun C Murthy <ac...@hortonworks.com>.
Hans-Peter,

 The CS, by default, will cap a single user at the queue's capacity (not max-capacity) - this is why you are seeing that any given user cannot go beyond the queue's capacity… in other words, you will need more than one user to utilize max-capacity of the queue.

 If you want to allow this behavior, you can set user-limit-factor (a multiple of queue-capacity), see yarn.scheduler.capacity.<queue-path>.user-limit-factor in:
 http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

hth,
Arun

On Jul 30, 2013, at 2:33 AM, Hans-Peter Zorn <hp...@gmail.com> wrote:

> Hi,
> 
> I would like to use the capacity scheduler to implement the following requirements.
> 
> 3 groups of users:
> * 2 of them are entitled to 50% of cluster capacity each but may use excess capacity if available
> * the third group only may use capacity that is currently not used by any of the other two.
> 
> I set up three queues:
> yarn.scheduler.capacity.root.a.capacity 50
> yarn.scheduler.capacity.root.a.maximum-capacity 100
> 
> yarn.scheduler.capacity.root.b.capacity 49
> yarn.scheduler.capacity.root.b.maximum-capacity 100
> 
> yarn.scheduler.capacity.root.c.capacity 1
> yarn.scheduler.capacity.root.c.maximum-capacity 100
> 
> These values are active, as they show up in the scheduler web gui.
> 
> However, users of each queue are only able to use the assigned capacity, the value
> of maximum-capacity has no effect.
> 
> Did I misunderstand the concept behind the maximum-capacity setting?
> 
> I am using hadoop as shipped with Cloudera CDH4.3  (2.0.0-cdh4.3.0). I asked this question already on cdh-users, but the capacity scheduler seems not to be used so much with cdh I guess.
> 
> Or is it possible to implement this using a different scheduler?
> 
> Best & thanks,
> Hans-Peter

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Capacity Scheduler on YARN

Posted by Arun C Murthy <ac...@hortonworks.com>.
Hans-Peter,

 The CS, by default, will cap a single user at the queue's capacity (not max-capacity) - this is why you are seeing that any given user cannot go beyond the queue's capacity… in other words, you will need more than one user to utilize max-capacity of the queue.

 If you want to allow this behavior, you can set user-limit-factor (a multiple of queue-capacity), see yarn.scheduler.capacity.<queue-path>.user-limit-factor in:
 http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

hth,
Arun

On Jul 30, 2013, at 2:33 AM, Hans-Peter Zorn <hp...@gmail.com> wrote:

> Hi,
> 
> I would like to use the capacity scheduler to implement the following requirements.
> 
> 3 groups of users:
> * 2 of them are entitled to 50% of cluster capacity each but may use excess capacity if available
> * the third group only may use capacity that is currently not used by any of the other two.
> 
> I set up three queues:
> yarn.scheduler.capacity.root.a.capacity 50
> yarn.scheduler.capacity.root.a.maximum-capacity 100
> 
> yarn.scheduler.capacity.root.b.capacity 49
> yarn.scheduler.capacity.root.b.maximum-capacity 100
> 
> yarn.scheduler.capacity.root.c.capacity 1
> yarn.scheduler.capacity.root.c.maximum-capacity 100
> 
> These values are active, as they show up in the scheduler web gui.
> 
> However, users of each queue are only able to use the assigned capacity, the value
> of maximum-capacity has no effect.
> 
> Did I misunderstand the concept behind the maximum-capacity setting?
> 
> I am using hadoop as shipped with Cloudera CDH4.3  (2.0.0-cdh4.3.0). I asked this question already on cdh-users, but the capacity scheduler seems not to be used so much with cdh I guess.
> 
> Or is it possible to implement this using a different scheduler?
> 
> Best & thanks,
> Hans-Peter

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/