You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Pallavi Palleti <pa...@corp.aol.com> on 2010/01/15 06:05:45 UTC

Query over efficient utilization of cluster using fair scheduling

Hi all,

I am experimenting with fair scheduler in a cluster of 10 machines. The 
users are given default values("0") for minMaps and minReduces in fair 
scheduler parameters. When I tried to run two jobs using the same 
username, the fair scheduler is giving 100% fair share to first 
job(needs 2 mappers) and the second job(needs10 mappers) is in waiting 
mode though the cluster is totally idle. Allowing these jobs to run 
simultaneously would take only 10% of total available mappers. However, 
the second job is not allowed to run till the first job is over. It 
would be great if some one can suggest some parameter tuning which can 
allow efficient utilization of cluster. Efficient I mean, allowing jobs 
to run when the cluster is idle rather letting them in waiting mode. I 
am not sure whether setting "minMaps, minReduces" for each user would 
resolve the issue. Kindly clarify.

Thanks
Pallavi

Re: Query over efficient utilization of cluster using fair scheduling

Posted by Pallavi Palleti <pa...@corp.aol.com>.
Thanks Todd. I have gone through the documentation earlier. However, 
these things were not very clear. This will help me in experimenting 
further. Thanks for the information. :-)

Regards
Pallavi

Todd Lipcon wrote:
> Hi Pallavi,
>
> If you remove userMaxJobsDefault, the default value is 
> Integer.MAX_VALUE - that is, it's unconstrained by this limit. This 
> means that the other limits and fair sharing would kick in if multiple 
> jobs are submitted. So, if you haven't set any of the min-slots, and 
> the jobs are all at the same priority, they'll share the number of 
> slots equally. Please check out the fair scheduler documentation in 
> docs/fair_scheduler.pdf in your distro.
>
> -Todd
>
> On Fri, Jan 15, 2010 at 1:15 AM, Pallavi Palleti 
> <pallavi.palleti@corp.aol.com <ma...@corp.aol.com>> 
> wrote:
>
>     Hi Todd,
>
>     Thanks for the reply. I figured out that *userMaxJobsDefault* was
>     set to 1. I have another query regarding the same. What will
>     happen if I remove *userMaxJobsDefault *property? What is the
>     default value? Would setting a value higher than 1 for a
>     particular user leads other users' jobs to stall till these jobs
>     get over? If so, is there a way where we can set that, a user can
>     take at max some percentage of total idle mappers existing at that
>     time? And, if the threshold exceeds, we can let users to run only
>     some defaults number of jobs at a time?  This way, we can avoid
>     stalling other users' jobs and also efficiently utilize the
>     cluster. Kindly clarify.
>
>     Thanks
>     Pallavi
>
>
>
>     Todd Lipcon wrote:
>>     Hi Pallavi,
>>
>>     This doesn't sound right. Can you visit
>>     http://jobtracker:50030/scheduler?advanced and maybe send a
>>     screenshot? And also upload the allocations.xml file you're using?
>>
>>     It sounds like you've managed to set either userMaxJobsDefault or
>>     maxRunningJobs for that user to 1.
>>
>>     -Todd
>>
>>     On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti
>>     <pallavi.palleti@corp.aol.com
>>     <ma...@corp.aol.com>> wrote:
>>
>>         Hi all,
>>
>>         I am experimenting with fair scheduler in a cluster of 10
>>         machines. The users are given default values("0") for minMaps
>>         and minReduces in fair scheduler parameters. When I tried to
>>         run two jobs using the same username, the fair scheduler is
>>         giving 100% fair share to first job(needs 2 mappers) and the
>>         second job(needs10 mappers) is in waiting mode though the
>>         cluster is totally idle. Allowing these jobs to run
>>         simultaneously would take only 10% of total available
>>         mappers. However, the second job is not allowed to run till
>>         the first job is over. It would be great if some one can
>>         suggest some parameter tuning which can allow efficient
>>         utilization of cluster. Efficient I mean, allowing jobs to
>>         run when the cluster is idle rather letting them in waiting
>>         mode. I am not sure whether setting "minMaps, minReduces" for
>>         each user would resolve the issue. Kindly clarify.
>>
>>         Thanks
>>         Pallavi
>>
>>
>

Re: Query over efficient utilization of cluster using fair scheduling

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Pallavi,

If you remove userMaxJobsDefault, the default value is Integer.MAX_VALUE -
that is, it's unconstrained by this limit. This means that the other limits
and fair sharing would kick in if multiple jobs are submitted. So, if you
haven't set any of the min-slots, and the jobs are all at the same priority,
they'll share the number of slots equally. Please check out the fair
scheduler documentation in docs/fair_scheduler.pdf in your distro.

-Todd

On Fri, Jan 15, 2010 at 1:15 AM, Pallavi Palleti <
pallavi.palleti@corp.aol.com> wrote:

>  Hi Todd,
>
> Thanks for the reply. I figured out that *userMaxJobsDefault*** was set to
> 1. I have another query regarding the same. What will happen if I remove *userMaxJobsDefault
> *property? What is the default value? Would setting a value higher than 1
> for a particular user leads other users' jobs to stall till these jobs get
> over? If so, is there a way where we can set that, a user can take at max
> some percentage of total idle mappers existing at that time? And, if the
> threshold exceeds, we can let users to run only some defaults number of jobs
> at a time?  This way, we can avoid stalling other users' jobs and also
> efficiently utilize the cluster. Kindly clarify.
>
> Thanks
> Pallavi
>
>
>
> Todd Lipcon wrote:
>
> Hi Pallavi,
>
>  This doesn't sound right. Can you visit
> http://jobtracker:50030/scheduler?advanced and maybe send a screenshot?
> And also upload the allocations.xml file you're using?
>
>  It sounds like you've managed to set either userMaxJobsDefault or
> maxRunningJobs for that user to 1.
>
>  -Todd
>
> On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti <
> pallavi.palleti@corp.aol.com> wrote:
>
>> Hi all,
>>
>> I am experimenting with fair scheduler in a cluster of 10 machines. The
>> users are given default values("0") for minMaps and minReduces in fair
>> scheduler parameters. When I tried to run two jobs using the same username,
>> the fair scheduler is giving 100% fair share to first job(needs 2 mappers)
>> and the second job(needs10 mappers) is in waiting mode though the cluster is
>> totally idle. Allowing these jobs to run simultaneously would take only 10%
>> of total available mappers. However, the second job is not allowed to run
>> till the first job is over. It would be great if some one can suggest some
>> parameter tuning which can allow efficient utilization of cluster. Efficient
>> I mean, allowing jobs to run when the cluster is idle rather letting them in
>> waiting mode. I am not sure whether setting "minMaps, minReduces" for each
>> user would resolve the issue. Kindly clarify.
>>
>> Thanks
>> Pallavi
>>
>
>

Re: Query over efficient utilization of cluster using fair scheduling

Posted by Pallavi Palleti <pa...@corp.aol.com>.
Hi Todd,

Thanks for the reply. I figured out that *userMaxJobsDefault*** was set 
to 1. I have another query regarding the same. What will happen if I 
remove *userMaxJobsDefault *property? What is the default value? Would 
setting a value higher than 1 for a particular user leads other users' 
jobs to stall till these jobs get over? If so, is there a way where we 
can set that, a user can take at max some percentage of total idle 
mappers existing at that time? And, if the threshold exceeds, we can let 
users to run only some defaults number of jobs at a time?  This way, we 
can avoid stalling other users' jobs and also efficiently utilize the 
cluster. Kindly clarify.

Thanks
Pallavi


Todd Lipcon wrote:
> Hi Pallavi,
>
> This doesn't sound right. Can you visit 
> http://jobtracker:50030/scheduler?advanced and maybe send a 
> screenshot? And also upload the allocations.xml file you're using?
>
> It sounds like you've managed to set either userMaxJobsDefault or 
> maxRunningJobs for that user to 1.
>
> -Todd
>
> On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti 
> <pallavi.palleti@corp.aol.com <ma...@corp.aol.com>> 
> wrote:
>
>     Hi all,
>
>     I am experimenting with fair scheduler in a cluster of 10
>     machines. The users are given default values("0") for minMaps and
>     minReduces in fair scheduler parameters. When I tried to run two
>     jobs using the same username, the fair scheduler is giving 100%
>     fair share to first job(needs 2 mappers) and the second
>     job(needs10 mappers) is in waiting mode though the cluster is
>     totally idle. Allowing these jobs to run simultaneously would take
>     only 10% of total available mappers. However, the second job is
>     not allowed to run till the first job is over. It would be great
>     if some one can suggest some parameter tuning which can allow
>     efficient utilization of cluster. Efficient I mean, allowing jobs
>     to run when the cluster is idle rather letting them in waiting
>     mode. I am not sure whether setting "minMaps, minReduces" for each
>     user would resolve the issue. Kindly clarify.
>
>     Thanks
>     Pallavi
>
>

Re: Query over efficient utilization of cluster using fair scheduling

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Pallavi,

This doesn't sound right. Can you visit
http://jobtracker:50030/scheduler?advanced and maybe send a screenshot? And
also upload the allocations.xml file you're using?

It sounds like you've managed to set either userMaxJobsDefault or
maxRunningJobs for that user to 1.

-Todd

On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti <
pallavi.palleti@corp.aol.com> wrote:

> Hi all,
>
> I am experimenting with fair scheduler in a cluster of 10 machines. The
> users are given default values("0") for minMaps and minReduces in fair
> scheduler parameters. When I tried to run two jobs using the same username,
> the fair scheduler is giving 100% fair share to first job(needs 2 mappers)
> and the second job(needs10 mappers) is in waiting mode though the cluster is
> totally idle. Allowing these jobs to run simultaneously would take only 10%
> of total available mappers. However, the second job is not allowed to run
> till the first job is over. It would be great if some one can suggest some
> parameter tuning which can allow efficient utilization of cluster. Efficient
> I mean, allowing jobs to run when the cluster is idle rather letting them in
> waiting mode. I am not sure whether setting "minMaps, minReduces" for each
> user would resolve the issue. Kindly clarify.
>
> Thanks
> Pallavi
>