You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mehmet Belgin <me...@oit.gatech.edu> on 2013/05/21 23:43:27 UTC

Is there a way to limit # of hadoop tasks per user at runtime?

Hi Everyone,

I was wondering if there is a way for limiting the number of tasks (map+reduce) *per user* at runtime? Using an environment variable perhaps? I am asking this from a resource provisioning perspective. I am trying to come up with a N-token licensing system for multiple users to use our limited hadoop resources simultaneously. That is, when user A checks out 6 tokens,  he/she can only run 6 hadoop tasks. 

If there is no such thing in hadoop, has anyone tried to integrate hadoop with torque/moab (or any other RM or scheduler)? Any advice in that direction will be appreciated :)

Thanks in advance,
-Mehmet







Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Harsh J <ha...@cloudera.com>.
The only pain point I'd find with CS in a multi-user environment is its
limitation of using queue configs. Its non-trivial to configure a queue per
user as CS doesn't provide any user level settings (it wasn't designed for
that initially), while in FS you get "user" level limiting settings for
free, while also being able to specify pools (for users, or generally for a
property, such as queues).


On Thu, May 23, 2013 at 10:55 PM, Amal G Jose <am...@gmail.com> wrote:

> You can use capacity scheduler also. In that you can create some queues,
> each of specific capacity. Then you can submit jobs to that specific queue
> at runtime or you can configure it as direct submission.
>
>
> On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Mehmet,
>>
>> Are you using MR1 or MR2?
>>
>> The fair scheduler, present in both versions, but configured slightly
>> differently, allows you to limit the number of map and reduce tasks in a
>> queue.  The configuration can be updated at runtime by modifying the
>> scheduler's allocations file.  It also has a feature that automatically
>> maps jobs to queues based on the user submitted them.
>>
>> Here are links to documentation in MR1 and MR2:
>> http://hadoop.apache.org/docs/stable/fair_scheduler.html
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>
>> -Sandy
>>
>>
>>
>> On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <
>> mehmet.belgin@oit.gatech.edu> wrote:
>>
>>> Hi Everyone,
>>>
>>> I was wondering if there is a way for limiting the number of tasks
>>> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
>>> I am asking this from a resource provisioning perspective. I am trying to
>>> come up with a N-token licensing system for multiple users to use our
>>> limited hadoop resources simultaneously. That is, when user A checks out 6
>>> tokens,  he/she can only run 6 hadoop tasks.
>>>
>>> If there is no such thing in hadoop, has anyone tried to integrate
>>> hadoop with torque/moab (or any other RM or scheduler)? Any advice in that
>>> direction will be appreciated :)
>>>
>>> Thanks in advance,
>>> -Mehmet
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


-- 
Harsh J

Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Harsh J <ha...@cloudera.com>.
The only pain point I'd find with CS in a multi-user environment is its
limitation of using queue configs. Its non-trivial to configure a queue per
user as CS doesn't provide any user level settings (it wasn't designed for
that initially), while in FS you get "user" level limiting settings for
free, while also being able to specify pools (for users, or generally for a
property, such as queues).


On Thu, May 23, 2013 at 10:55 PM, Amal G Jose <am...@gmail.com> wrote:

> You can use capacity scheduler also. In that you can create some queues,
> each of specific capacity. Then you can submit jobs to that specific queue
> at runtime or you can configure it as direct submission.
>
>
> On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Mehmet,
>>
>> Are you using MR1 or MR2?
>>
>> The fair scheduler, present in both versions, but configured slightly
>> differently, allows you to limit the number of map and reduce tasks in a
>> queue.  The configuration can be updated at runtime by modifying the
>> scheduler's allocations file.  It also has a feature that automatically
>> maps jobs to queues based on the user submitted them.
>>
>> Here are links to documentation in MR1 and MR2:
>> http://hadoop.apache.org/docs/stable/fair_scheduler.html
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>
>> -Sandy
>>
>>
>>
>> On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <
>> mehmet.belgin@oit.gatech.edu> wrote:
>>
>>> Hi Everyone,
>>>
>>> I was wondering if there is a way for limiting the number of tasks
>>> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
>>> I am asking this from a resource provisioning perspective. I am trying to
>>> come up with a N-token licensing system for multiple users to use our
>>> limited hadoop resources simultaneously. That is, when user A checks out 6
>>> tokens,  he/she can only run 6 hadoop tasks.
>>>
>>> If there is no such thing in hadoop, has anyone tried to integrate
>>> hadoop with torque/moab (or any other RM or scheduler)? Any advice in that
>>> direction will be appreciated :)
>>>
>>> Thanks in advance,
>>> -Mehmet
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


-- 
Harsh J

Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Harsh J <ha...@cloudera.com>.
The only pain point I'd find with CS in a multi-user environment is its
limitation of using queue configs. Its non-trivial to configure a queue per
user as CS doesn't provide any user level settings (it wasn't designed for
that initially), while in FS you get "user" level limiting settings for
free, while also being able to specify pools (for users, or generally for a
property, such as queues).


On Thu, May 23, 2013 at 10:55 PM, Amal G Jose <am...@gmail.com> wrote:

> You can use capacity scheduler also. In that you can create some queues,
> each of specific capacity. Then you can submit jobs to that specific queue
> at runtime or you can configure it as direct submission.
>
>
> On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Mehmet,
>>
>> Are you using MR1 or MR2?
>>
>> The fair scheduler, present in both versions, but configured slightly
>> differently, allows you to limit the number of map and reduce tasks in a
>> queue.  The configuration can be updated at runtime by modifying the
>> scheduler's allocations file.  It also has a feature that automatically
>> maps jobs to queues based on the user submitted them.
>>
>> Here are links to documentation in MR1 and MR2:
>> http://hadoop.apache.org/docs/stable/fair_scheduler.html
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>
>> -Sandy
>>
>>
>>
>> On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <
>> mehmet.belgin@oit.gatech.edu> wrote:
>>
>>> Hi Everyone,
>>>
>>> I was wondering if there is a way for limiting the number of tasks
>>> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
>>> I am asking this from a resource provisioning perspective. I am trying to
>>> come up with a N-token licensing system for multiple users to use our
>>> limited hadoop resources simultaneously. That is, when user A checks out 6
>>> tokens,  he/she can only run 6 hadoop tasks.
>>>
>>> If there is no such thing in hadoop, has anyone tried to integrate
>>> hadoop with torque/moab (or any other RM or scheduler)? Any advice in that
>>> direction will be appreciated :)
>>>
>>> Thanks in advance,
>>> -Mehmet
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


-- 
Harsh J

Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Harsh J <ha...@cloudera.com>.
The only pain point I'd find with CS in a multi-user environment is its
limitation of using queue configs. Its non-trivial to configure a queue per
user as CS doesn't provide any user level settings (it wasn't designed for
that initially), while in FS you get "user" level limiting settings for
free, while also being able to specify pools (for users, or generally for a
property, such as queues).


On Thu, May 23, 2013 at 10:55 PM, Amal G Jose <am...@gmail.com> wrote:

> You can use capacity scheduler also. In that you can create some queues,
> each of specific capacity. Then you can submit jobs to that specific queue
> at runtime or you can configure it as direct submission.
>
>
> On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Mehmet,
>>
>> Are you using MR1 or MR2?
>>
>> The fair scheduler, present in both versions, but configured slightly
>> differently, allows you to limit the number of map and reduce tasks in a
>> queue.  The configuration can be updated at runtime by modifying the
>> scheduler's allocations file.  It also has a feature that automatically
>> maps jobs to queues based on the user submitted them.
>>
>> Here are links to documentation in MR1 and MR2:
>> http://hadoop.apache.org/docs/stable/fair_scheduler.html
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>
>> -Sandy
>>
>>
>>
>> On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <
>> mehmet.belgin@oit.gatech.edu> wrote:
>>
>>> Hi Everyone,
>>>
>>> I was wondering if there is a way for limiting the number of tasks
>>> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
>>> I am asking this from a resource provisioning perspective. I am trying to
>>> come up with a N-token licensing system for multiple users to use our
>>> limited hadoop resources simultaneously. That is, when user A checks out 6
>>> tokens,  he/she can only run 6 hadoop tasks.
>>>
>>> If there is no such thing in hadoop, has anyone tried to integrate
>>> hadoop with torque/moab (or any other RM or scheduler)? Any advice in that
>>> direction will be appreciated :)
>>>
>>> Thanks in advance,
>>> -Mehmet
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


-- 
Harsh J

Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Amal G Jose <am...@gmail.com>.
You can use capacity scheduler also. In that you can create some queues,
each of specific capacity. Then you can submit jobs to that specific queue
at runtime or you can configure it as direct submission.


On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi Mehmet,
>
> Are you using MR1 or MR2?
>
> The fair scheduler, present in both versions, but configured slightly
> differently, allows you to limit the number of map and reduce tasks in a
> queue.  The configuration can be updated at runtime by modifying the
> scheduler's allocations file.  It also has a feature that automatically
> maps jobs to queues based on the user submitted them.
>
> Here are links to documentation in MR1 and MR2:
> http://hadoop.apache.org/docs/stable/fair_scheduler.html
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>
> -Sandy
>
>
>
> On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <
> mehmet.belgin@oit.gatech.edu> wrote:
>
>> Hi Everyone,
>>
>> I was wondering if there is a way for limiting the number of tasks
>> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
>> I am asking this from a resource provisioning perspective. I am trying to
>> come up with a N-token licensing system for multiple users to use our
>> limited hadoop resources simultaneously. That is, when user A checks out 6
>> tokens,  he/she can only run 6 hadoop tasks.
>>
>> If there is no such thing in hadoop, has anyone tried to integrate hadoop
>> with torque/moab (or any other RM or scheduler)? Any advice in that
>> direction will be appreciated :)
>>
>> Thanks in advance,
>> -Mehmet
>>
>>
>>
>>
>>
>>
>>
>

Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Amal G Jose <am...@gmail.com>.
You can use capacity scheduler also. In that you can create some queues,
each of specific capacity. Then you can submit jobs to that specific queue
at runtime or you can configure it as direct submission.


On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi Mehmet,
>
> Are you using MR1 or MR2?
>
> The fair scheduler, present in both versions, but configured slightly
> differently, allows you to limit the number of map and reduce tasks in a
> queue.  The configuration can be updated at runtime by modifying the
> scheduler's allocations file.  It also has a feature that automatically
> maps jobs to queues based on the user submitted them.
>
> Here are links to documentation in MR1 and MR2:
> http://hadoop.apache.org/docs/stable/fair_scheduler.html
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>
> -Sandy
>
>
>
> On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <
> mehmet.belgin@oit.gatech.edu> wrote:
>
>> Hi Everyone,
>>
>> I was wondering if there is a way for limiting the number of tasks
>> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
>> I am asking this from a resource provisioning perspective. I am trying to
>> come up with a N-token licensing system for multiple users to use our
>> limited hadoop resources simultaneously. That is, when user A checks out 6
>> tokens,  he/she can only run 6 hadoop tasks.
>>
>> If there is no such thing in hadoop, has anyone tried to integrate hadoop
>> with torque/moab (or any other RM or scheduler)? Any advice in that
>> direction will be appreciated :)
>>
>> Thanks in advance,
>> -Mehmet
>>
>>
>>
>>
>>
>>
>>
>

Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Amal G Jose <am...@gmail.com>.
You can use capacity scheduler also. In that you can create some queues,
each of specific capacity. Then you can submit jobs to that specific queue
at runtime or you can configure it as direct submission.


On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi Mehmet,
>
> Are you using MR1 or MR2?
>
> The fair scheduler, present in both versions, but configured slightly
> differently, allows you to limit the number of map and reduce tasks in a
> queue.  The configuration can be updated at runtime by modifying the
> scheduler's allocations file.  It also has a feature that automatically
> maps jobs to queues based on the user submitted them.
>
> Here are links to documentation in MR1 and MR2:
> http://hadoop.apache.org/docs/stable/fair_scheduler.html
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>
> -Sandy
>
>
>
> On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <
> mehmet.belgin@oit.gatech.edu> wrote:
>
>> Hi Everyone,
>>
>> I was wondering if there is a way for limiting the number of tasks
>> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
>> I am asking this from a resource provisioning perspective. I am trying to
>> come up with a N-token licensing system for multiple users to use our
>> limited hadoop resources simultaneously. That is, when user A checks out 6
>> tokens,  he/she can only run 6 hadoop tasks.
>>
>> If there is no such thing in hadoop, has anyone tried to integrate hadoop
>> with torque/moab (or any other RM or scheduler)? Any advice in that
>> direction will be appreciated :)
>>
>> Thanks in advance,
>> -Mehmet
>>
>>
>>
>>
>>
>>
>>
>

Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Amal G Jose <am...@gmail.com>.
You can use capacity scheduler also. In that you can create some queues,
each of specific capacity. Then you can submit jobs to that specific queue
at runtime or you can configure it as direct submission.


On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi Mehmet,
>
> Are you using MR1 or MR2?
>
> The fair scheduler, present in both versions, but configured slightly
> differently, allows you to limit the number of map and reduce tasks in a
> queue.  The configuration can be updated at runtime by modifying the
> scheduler's allocations file.  It also has a feature that automatically
> maps jobs to queues based on the user submitted them.
>
> Here are links to documentation in MR1 and MR2:
> http://hadoop.apache.org/docs/stable/fair_scheduler.html
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>
> -Sandy
>
>
>
> On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <
> mehmet.belgin@oit.gatech.edu> wrote:
>
>> Hi Everyone,
>>
>> I was wondering if there is a way for limiting the number of tasks
>> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
>> I am asking this from a resource provisioning perspective. I am trying to
>> come up with a N-token licensing system for multiple users to use our
>> limited hadoop resources simultaneously. That is, when user A checks out 6
>> tokens,  he/she can only run 6 hadoop tasks.
>>
>> If there is no such thing in hadoop, has anyone tried to integrate hadoop
>> with torque/moab (or any other RM or scheduler)? Any advice in that
>> direction will be appreciated :)
>>
>> Thanks in advance,
>> -Mehmet
>>
>>
>>
>>
>>
>>
>>
>

Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Mehmet,

Are you using MR1 or MR2?

The fair scheduler, present in both versions, but configured slightly
differently, allows you to limit the number of map and reduce tasks in a
queue.  The configuration can be updated at runtime by modifying the
scheduler's allocations file.  It also has a feature that automatically
maps jobs to queues based on the user submitted them.

Here are links to documentation in MR1 and MR2:
http://hadoop.apache.org/docs/stable/fair_scheduler.html
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

-Sandy



On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <mehmet.belgin@oit.gatech.edu
> wrote:

> Hi Everyone,
>
> I was wondering if there is a way for limiting the number of tasks
> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
> I am asking this from a resource provisioning perspective. I am trying to
> come up with a N-token licensing system for multiple users to use our
> limited hadoop resources simultaneously. That is, when user A checks out 6
> tokens,  he/she can only run 6 hadoop tasks.
>
> If there is no such thing in hadoop, has anyone tried to integrate hadoop
> with torque/moab (or any other RM or scheduler)? Any advice in that
> direction will be appreciated :)
>
> Thanks in advance,
> -Mehmet
>
>
>
>
>
>
>

Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Mehmet,

Are you using MR1 or MR2?

The fair scheduler, present in both versions, but configured slightly
differently, allows you to limit the number of map and reduce tasks in a
queue.  The configuration can be updated at runtime by modifying the
scheduler's allocations file.  It also has a feature that automatically
maps jobs to queues based on the user submitted them.

Here are links to documentation in MR1 and MR2:
http://hadoop.apache.org/docs/stable/fair_scheduler.html
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

-Sandy



On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <mehmet.belgin@oit.gatech.edu
> wrote:

> Hi Everyone,
>
> I was wondering if there is a way for limiting the number of tasks
> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
> I am asking this from a resource provisioning perspective. I am trying to
> come up with a N-token licensing system for multiple users to use our
> limited hadoop resources simultaneously. That is, when user A checks out 6
> tokens,  he/she can only run 6 hadoop tasks.
>
> If there is no such thing in hadoop, has anyone tried to integrate hadoop
> with torque/moab (or any other RM or scheduler)? Any advice in that
> direction will be appreciated :)
>
> Thanks in advance,
> -Mehmet
>
>
>
>
>
>
>

Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Mehmet,

Are you using MR1 or MR2?

The fair scheduler, present in both versions, but configured slightly
differently, allows you to limit the number of map and reduce tasks in a
queue.  The configuration can be updated at runtime by modifying the
scheduler's allocations file.  It also has a feature that automatically
maps jobs to queues based on the user submitted them.

Here are links to documentation in MR1 and MR2:
http://hadoop.apache.org/docs/stable/fair_scheduler.html
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

-Sandy



On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <mehmet.belgin@oit.gatech.edu
> wrote:

> Hi Everyone,
>
> I was wondering if there is a way for limiting the number of tasks
> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
> I am asking this from a resource provisioning perspective. I am trying to
> come up with a N-token licensing system for multiple users to use our
> limited hadoop resources simultaneously. That is, when user A checks out 6
> tokens,  he/she can only run 6 hadoop tasks.
>
> If there is no such thing in hadoop, has anyone tried to integrate hadoop
> with torque/moab (or any other RM or scheduler)? Any advice in that
> direction will be appreciated :)
>
> Thanks in advance,
> -Mehmet
>
>
>
>
>
>
>

Re: Is there a way to limit # of hadoop tasks per user at runtime?

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Mehmet,

Are you using MR1 or MR2?

The fair scheduler, present in both versions, but configured slightly
differently, allows you to limit the number of map and reduce tasks in a
queue.  The configuration can be updated at runtime by modifying the
scheduler's allocations file.  It also has a feature that automatically
maps jobs to queues based on the user submitted them.

Here are links to documentation in MR1 and MR2:
http://hadoop.apache.org/docs/stable/fair_scheduler.html
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

-Sandy



On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin <mehmet.belgin@oit.gatech.edu
> wrote:

> Hi Everyone,
>
> I was wondering if there is a way for limiting the number of tasks
> (map+reduce) *per user* at runtime? Using an environment variable perhaps?
> I am asking this from a resource provisioning perspective. I am trying to
> come up with a N-token licensing system for multiple users to use our
> limited hadoop resources simultaneously. That is, when user A checks out 6
> tokens,  he/she can only run 6 hadoop tasks.
>
> If there is no such thing in hadoop, has anyone tried to integrate hadoop
> with torque/moab (or any other RM or scheduler)? Any advice in that
> direction will be appreciated :)
>
> Thanks in advance,
> -Mehmet
>
>
>
>
>
>
>