You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Pallavi Palleti <pa...@corp.aol.com> on 2010/01/15 06:05:45 UTC
Query over efficient utilization of cluster using fair scheduling
Hi all,
I am experimenting with fair scheduler in a cluster of 10 machines. The
users are given default values("0") for minMaps and minReduces in fair
scheduler parameters. When I tried to run two jobs using the same
username, the fair scheduler is giving 100% fair share to first
job(needs 2 mappers) and the second job(needs10 mappers) is in waiting
mode though the cluster is totally idle. Allowing these jobs to run
simultaneously would take only 10% of total available mappers. However,
the second job is not allowed to run till the first job is over. It
would be great if some one can suggest some parameter tuning which can
allow efficient utilization of cluster. Efficient I mean, allowing jobs
to run when the cluster is idle rather letting them in waiting mode. I
am not sure whether setting "minMaps, minReduces" for each user would
resolve the issue. Kindly clarify.
Thanks
Pallavi
Re: Query over efficient utilization of cluster using fair scheduling
Posted by Pallavi Palleti <pa...@corp.aol.com>.
Thanks Todd. I have gone through the documentation earlier. However,
these things were not very clear. This will help me in experimenting
further. Thanks for the information. :-)
Regards
Pallavi
Todd Lipcon wrote:
> Hi Pallavi,
>
> If you remove userMaxJobsDefault, the default value is
> Integer.MAX_VALUE - that is, it's unconstrained by this limit. This
> means that the other limits and fair sharing would kick in if multiple
> jobs are submitted. So, if you haven't set any of the min-slots, and
> the jobs are all at the same priority, they'll share the number of
> slots equally. Please check out the fair scheduler documentation in
> docs/fair_scheduler.pdf in your distro.
>
> -Todd
>
> On Fri, Jan 15, 2010 at 1:15 AM, Pallavi Palleti
> <pallavi.palleti@corp.aol.com <ma...@corp.aol.com>>
> wrote:
>
> Hi Todd,
>
> Thanks for the reply. I figured out that *userMaxJobsDefault* was
> set to 1. I have another query regarding the same. What will
> happen if I remove *userMaxJobsDefault *property? What is the
> default value? Would setting a value higher than 1 for a
> particular user leads other users' jobs to stall till these jobs
> get over? If so, is there a way where we can set that, a user can
> take at max some percentage of total idle mappers existing at that
> time? And, if the threshold exceeds, we can let users to run only
> some defaults number of jobs at a time? This way, we can avoid
> stalling other users' jobs and also efficiently utilize the
> cluster. Kindly clarify.
>
> Thanks
> Pallavi
>
>
>
> Todd Lipcon wrote:
>> Hi Pallavi,
>>
>> This doesn't sound right. Can you visit
>> http://jobtracker:50030/scheduler?advanced and maybe send a
>> screenshot? And also upload the allocations.xml file you're using?
>>
>> It sounds like you've managed to set either userMaxJobsDefault or
>> maxRunningJobs for that user to 1.
>>
>> -Todd
>>
>> On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti
>> <pallavi.palleti@corp.aol.com
>> <ma...@corp.aol.com>> wrote:
>>
>> Hi all,
>>
>> I am experimenting with fair scheduler in a cluster of 10
>> machines. The users are given default values("0") for minMaps
>> and minReduces in fair scheduler parameters. When I tried to
>> run two jobs using the same username, the fair scheduler is
>> giving 100% fair share to first job(needs 2 mappers) and the
>> second job(needs10 mappers) is in waiting mode though the
>> cluster is totally idle. Allowing these jobs to run
>> simultaneously would take only 10% of total available
>> mappers. However, the second job is not allowed to run till
>> the first job is over. It would be great if some one can
>> suggest some parameter tuning which can allow efficient
>> utilization of cluster. Efficient I mean, allowing jobs to
>> run when the cluster is idle rather letting them in waiting
>> mode. I am not sure whether setting "minMaps, minReduces" for
>> each user would resolve the issue. Kindly clarify.
>>
>> Thanks
>> Pallavi
>>
>>
>
Re: Query over efficient utilization of cluster using fair scheduling
Posted by Todd Lipcon <to...@cloudera.com>.
Hi Pallavi,
If you remove userMaxJobsDefault, the default value is Integer.MAX_VALUE -
that is, it's unconstrained by this limit. This means that the other limits
and fair sharing would kick in if multiple jobs are submitted. So, if you
haven't set any of the min-slots, and the jobs are all at the same priority,
they'll share the number of slots equally. Please check out the fair
scheduler documentation in docs/fair_scheduler.pdf in your distro.
-Todd
On Fri, Jan 15, 2010 at 1:15 AM, Pallavi Palleti <
pallavi.palleti@corp.aol.com> wrote:
> Hi Todd,
>
> Thanks for the reply. I figured out that *userMaxJobsDefault*** was set to
> 1. I have another query regarding the same. What will happen if I remove *userMaxJobsDefault
> *property? What is the default value? Would setting a value higher than 1
> for a particular user leads other users' jobs to stall till these jobs get
> over? If so, is there a way where we can set that, a user can take at max
> some percentage of total idle mappers existing at that time? And, if the
> threshold exceeds, we can let users to run only some defaults number of jobs
> at a time? This way, we can avoid stalling other users' jobs and also
> efficiently utilize the cluster. Kindly clarify.
>
> Thanks
> Pallavi
>
>
>
> Todd Lipcon wrote:
>
> Hi Pallavi,
>
> This doesn't sound right. Can you visit
> http://jobtracker:50030/scheduler?advanced and maybe send a screenshot?
> And also upload the allocations.xml file you're using?
>
> It sounds like you've managed to set either userMaxJobsDefault or
> maxRunningJobs for that user to 1.
>
> -Todd
>
> On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti <
> pallavi.palleti@corp.aol.com> wrote:
>
>> Hi all,
>>
>> I am experimenting with fair scheduler in a cluster of 10 machines. The
>> users are given default values("0") for minMaps and minReduces in fair
>> scheduler parameters. When I tried to run two jobs using the same username,
>> the fair scheduler is giving 100% fair share to first job(needs 2 mappers)
>> and the second job(needs10 mappers) is in waiting mode though the cluster is
>> totally idle. Allowing these jobs to run simultaneously would take only 10%
>> of total available mappers. However, the second job is not allowed to run
>> till the first job is over. It would be great if some one can suggest some
>> parameter tuning which can allow efficient utilization of cluster. Efficient
>> I mean, allowing jobs to run when the cluster is idle rather letting them in
>> waiting mode. I am not sure whether setting "minMaps, minReduces" for each
>> user would resolve the issue. Kindly clarify.
>>
>> Thanks
>> Pallavi
>>
>
>
Re: Query over efficient utilization of cluster using fair scheduling
Posted by Pallavi Palleti <pa...@corp.aol.com>.
Hi Todd,
Thanks for the reply. I figured out that *userMaxJobsDefault*** was set
to 1. I have another query regarding the same. What will happen if I
remove *userMaxJobsDefault *property? What is the default value? Would
setting a value higher than 1 for a particular user leads other users'
jobs to stall till these jobs get over? If so, is there a way where we
can set that, a user can take at max some percentage of total idle
mappers existing at that time? And, if the threshold exceeds, we can let
users to run only some defaults number of jobs at a time? This way, we
can avoid stalling other users' jobs and also efficiently utilize the
cluster. Kindly clarify.
Thanks
Pallavi
Todd Lipcon wrote:
> Hi Pallavi,
>
> This doesn't sound right. Can you visit
> http://jobtracker:50030/scheduler?advanced and maybe send a
> screenshot? And also upload the allocations.xml file you're using?
>
> It sounds like you've managed to set either userMaxJobsDefault or
> maxRunningJobs for that user to 1.
>
> -Todd
>
> On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti
> <pallavi.palleti@corp.aol.com <ma...@corp.aol.com>>
> wrote:
>
> Hi all,
>
> I am experimenting with fair scheduler in a cluster of 10
> machines. The users are given default values("0") for minMaps and
> minReduces in fair scheduler parameters. When I tried to run two
> jobs using the same username, the fair scheduler is giving 100%
> fair share to first job(needs 2 mappers) and the second
> job(needs10 mappers) is in waiting mode though the cluster is
> totally idle. Allowing these jobs to run simultaneously would take
> only 10% of total available mappers. However, the second job is
> not allowed to run till the first job is over. It would be great
> if some one can suggest some parameter tuning which can allow
> efficient utilization of cluster. Efficient I mean, allowing jobs
> to run when the cluster is idle rather letting them in waiting
> mode. I am not sure whether setting "minMaps, minReduces" for each
> user would resolve the issue. Kindly clarify.
>
> Thanks
> Pallavi
>
>
Re: Query over efficient utilization of cluster using fair scheduling
Posted by Todd Lipcon <to...@cloudera.com>.
Hi Pallavi,
This doesn't sound right. Can you visit
http://jobtracker:50030/scheduler?advanced and maybe send a screenshot? And
also upload the allocations.xml file you're using?
It sounds like you've managed to set either userMaxJobsDefault or
maxRunningJobs for that user to 1.
-Todd
On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti <
pallavi.palleti@corp.aol.com> wrote:
> Hi all,
>
> I am experimenting with fair scheduler in a cluster of 10 machines. The
> users are given default values("0") for minMaps and minReduces in fair
> scheduler parameters. When I tried to run two jobs using the same username,
> the fair scheduler is giving 100% fair share to first job(needs 2 mappers)
> and the second job(needs10 mappers) is in waiting mode though the cluster is
> totally idle. Allowing these jobs to run simultaneously would take only 10%
> of total available mappers. However, the second job is not allowed to run
> till the first job is over. It would be great if some one can suggest some
> parameter tuning which can allow efficient utilization of cluster. Efficient
> I mean, allowing jobs to run when the cluster is idle rather letting them in
> waiting mode. I am not sure whether setting "minMaps, minReduces" for each
> user would resolve the issue. Kindly clarify.
>
> Thanks
> Pallavi
>