You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by abhishek sharma <ab...@usc.edu> on 2010/04/11 20:42:46 UTC

cluster under-utilization with Hadoop Fair Scheduler

Hi all,

I have been using the Hadoop Fair Scheduler for some experiments on a
100 node cluster with 2 map slots per node (hence, a total of 200 map
slots).

In one of my experiments, all the map tasks finish within a heartbeat
interval of 3 seconds. I noticed that the maximum number of
concurrently
active map slots on my cluster never exceeds 100, and hence, the
cluster utilization during my experiments never exceeds 50% even when
large jobs with more than a 1000 maps are being executed.

A look at the Fair Scheduler code (in particular, the assignTasks
function) revealed the reason.
As per my understanding, with the implementation in Hadoop 0.20.0, a
TaskTracker is not assigned more than 1 map and 1 reduce task per
heart beat.

In my experiments, in every heart beat, each TT has 2 free map slots
but is assigned only 1 map task, and hence, the utilization never goes
beyond 50%.

Of course, this (degenerate) case does not arise when map tasks take
more than one 1 heart beat interval to finish. For example, I repeated
the experiments with maps tasks taking close to 15 s to finish and
noticed close to 100 % utilization when large jobs were executing.

Why does the Fair Scheduler not assign more than one map task to a TT
per heart beat? Is this done to spread the load uniformly across the
cluster?
I looked at assignTasks function in the default Hadoop scheduler
(JobQueueTaskScheduler.java), and it does assign more than 1 map task
per heart beat to a TT.

It will be easy to change the Fair Scheduler to assign more than 1 map
task to a TT per heart beat (I did that and achieved 100% utilization
even with small map tasks). But I am wondering, if doing so will
violate some fairness properties.

Thanks,
Abhishek

Re: cluster under-utilization with Hadoop Fair Scheduler

Posted by abhishek sharma <ab...@usc.edu>.
Hi Ted,

Were you referring to the Hadoop 0.20.2 distribution or the CDH version?

I just looked at the FairScheduler assignTasks function in Hadoop
dist. 0.20.2 and it is the same as version 0.20.0, and it will assign
only 1 Map and 1 reduce task to a tasktracker per heartbeat (as far I
can tell by reading the code and my experiments).

Abhishek



On Sun, Apr 11, 2010 at 2:51 PM, Ted Yu <yu...@gmail.com> wrote:
> Reading assignTasks() in 0.20.2 reveals that the number of map tasks
> assigned is not limited to 1 per heartbeat.
>
> Cheers
>
> On Sun, Apr 11, 2010 at 12:30 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hi Abhishek,
>>
>> This behavior is improved by MAPREDUCE-706 I believe (not certain that
>> that's the JIRA, but I know it's fixed in trunk fairscheduler). These
>> patches are included in CDH3 (currently in beta)
>> http://archive.cloudera.com/cdh/3/
>>
>> In general, though, map tasks that are so short are not going to be very
>> efficient - even with fast assignment there is some constant overhead per
>> task.
>>
>> Thanks
>> -Todd
>>
>> On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma <ab...@usc.edu>
>> wrote:
>>
>> > Hi all,
>> >
>> > I have been using the Hadoop Fair Scheduler for some experiments on a
>> > 100 node cluster with 2 map slots per node (hence, a total of 200 map
>> > slots).
>> >
>> > In one of my experiments, all the map tasks finish within a heartbeat
>> > interval of 3 seconds. I noticed that the maximum number of
>> > concurrently
>> > active map slots on my cluster never exceeds 100, and hence, the
>> > cluster utilization during my experiments never exceeds 50% even when
>> > large jobs with more than a 1000 maps are being executed.
>> >
>> > A look at the Fair Scheduler code (in particular, the assignTasks
>> > function) revealed the reason.
>> > As per my understanding, with the implementation in Hadoop 0.20.0, a
>> > TaskTracker is not assigned more than 1 map and 1 reduce task per
>> > heart beat.
>> >
>> > In my experiments, in every heart beat, each TT has 2 free map slots
>> > but is assigned only 1 map task, and hence, the utilization never goes
>> > beyond 50%.
>> >
>> > Of course, this (degenerate) case does not arise when map tasks take
>> > more than one 1 heart beat interval to finish. For example, I repeated
>> > the experiments with maps tasks taking close to 15 s to finish and
>> > noticed close to 100 % utilization when large jobs were executing.
>> >
>> > Why does the Fair Scheduler not assign more than one map task to a TT
>> > per heart beat? Is this done to spread the load uniformly across the
>> > cluster?
>> > I looked at assignTasks function in the default Hadoop scheduler
>> > (JobQueueTaskScheduler.java), and it does assign more than 1 map task
>> > per heart beat to a TT.
>> >
>> > It will be easy to change the Fair Scheduler to assign more than 1 map
>> > task to a TT per heart beat (I did that and achieved 100% utilization
>> > even with small map tasks). But I am wondering, if doing so will
>> > violate some fairness properties.
>> >
>> > Thanks,
>> > Abhishek
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

Re: cluster under-utilization with Hadoop Fair Scheduler

Posted by abhishek sharma <ab...@usc.edu>.
Hi Ted,

I was referring to version 0.20.0. As Todd pointed out, the issue I
pointed out was fixed in version 0.20.2.

I only looked at the Cloudera version 0.20.2+228
(http://archive.cloudera.com/cdh/3/) currently in beta.

I guess Hadoop 0.20.2 also has the fix. I will take a look at that too.

Thanks,
Abhishek

On Sun, Apr 11, 2010 at 2:51 PM, Ted Yu <yu...@gmail.com> wrote:
> Reading assignTasks() in 0.20.2 reveals that the number of map tasks
> assigned is not limited to 1 per heartbeat.
>
> Cheers
>
> On Sun, Apr 11, 2010 at 12:30 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hi Abhishek,
>>
>> This behavior is improved by MAPREDUCE-706 I believe (not certain that
>> that's the JIRA, but I know it's fixed in trunk fairscheduler). These
>> patches are included in CDH3 (currently in beta)
>> http://archive.cloudera.com/cdh/3/
>>
>> In general, though, map tasks that are so short are not going to be very
>> efficient - even with fast assignment there is some constant overhead per
>> task.
>>
>> Thanks
>> -Todd
>>
>> On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma <ab...@usc.edu>
>> wrote:
>>
>> > Hi all,
>> >
>> > I have been using the Hadoop Fair Scheduler for some experiments on a
>> > 100 node cluster with 2 map slots per node (hence, a total of 200 map
>> > slots).
>> >
>> > In one of my experiments, all the map tasks finish within a heartbeat
>> > interval of 3 seconds. I noticed that the maximum number of
>> > concurrently
>> > active map slots on my cluster never exceeds 100, and hence, the
>> > cluster utilization during my experiments never exceeds 50% even when
>> > large jobs with more than a 1000 maps are being executed.
>> >
>> > A look at the Fair Scheduler code (in particular, the assignTasks
>> > function) revealed the reason.
>> > As per my understanding, with the implementation in Hadoop 0.20.0, a
>> > TaskTracker is not assigned more than 1 map and 1 reduce task per
>> > heart beat.
>> >
>> > In my experiments, in every heart beat, each TT has 2 free map slots
>> > but is assigned only 1 map task, and hence, the utilization never goes
>> > beyond 50%.
>> >
>> > Of course, this (degenerate) case does not arise when map tasks take
>> > more than one 1 heart beat interval to finish. For example, I repeated
>> > the experiments with maps tasks taking close to 15 s to finish and
>> > noticed close to 100 % utilization when large jobs were executing.
>> >
>> > Why does the Fair Scheduler not assign more than one map task to a TT
>> > per heart beat? Is this done to spread the load uniformly across the
>> > cluster?
>> > I looked at assignTasks function in the default Hadoop scheduler
>> > (JobQueueTaskScheduler.java), and it does assign more than 1 map task
>> > per heart beat to a TT.
>> >
>> > It will be easy to change the Fair Scheduler to assign more than 1 map
>> > task to a TT per heart beat (I did that and achieved 100% utilization
>> > even with small map tasks). But I am wondering, if doing so will
>> > violate some fairness properties.
>> >
>> > Thanks,
>> > Abhishek
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

Re: cluster under-utilization with Hadoop Fair Scheduler

Posted by Ted Yu <yu...@gmail.com>.
Reading assignTasks() in 0.20.2 reveals that the number of map tasks
assigned is not limited to 1 per heartbeat.

Cheers

On Sun, Apr 11, 2010 at 12:30 PM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi Abhishek,
>
> This behavior is improved by MAPREDUCE-706 I believe (not certain that
> that's the JIRA, but I know it's fixed in trunk fairscheduler). These
> patches are included in CDH3 (currently in beta)
> http://archive.cloudera.com/cdh/3/
>
> In general, though, map tasks that are so short are not going to be very
> efficient - even with fast assignment there is some constant overhead per
> task.
>
> Thanks
> -Todd
>
> On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma <ab...@usc.edu>
> wrote:
>
> > Hi all,
> >
> > I have been using the Hadoop Fair Scheduler for some experiments on a
> > 100 node cluster with 2 map slots per node (hence, a total of 200 map
> > slots).
> >
> > In one of my experiments, all the map tasks finish within a heartbeat
> > interval of 3 seconds. I noticed that the maximum number of
> > concurrently
> > active map slots on my cluster never exceeds 100, and hence, the
> > cluster utilization during my experiments never exceeds 50% even when
> > large jobs with more than a 1000 maps are being executed.
> >
> > A look at the Fair Scheduler code (in particular, the assignTasks
> > function) revealed the reason.
> > As per my understanding, with the implementation in Hadoop 0.20.0, a
> > TaskTracker is not assigned more than 1 map and 1 reduce task per
> > heart beat.
> >
> > In my experiments, in every heart beat, each TT has 2 free map slots
> > but is assigned only 1 map task, and hence, the utilization never goes
> > beyond 50%.
> >
> > Of course, this (degenerate) case does not arise when map tasks take
> > more than one 1 heart beat interval to finish. For example, I repeated
> > the experiments with maps tasks taking close to 15 s to finish and
> > noticed close to 100 % utilization when large jobs were executing.
> >
> > Why does the Fair Scheduler not assign more than one map task to a TT
> > per heart beat? Is this done to spread the load uniformly across the
> > cluster?
> > I looked at assignTasks function in the default Hadoop scheduler
> > (JobQueueTaskScheduler.java), and it does assign more than 1 map task
> > per heart beat to a TT.
> >
> > It will be easy to change the Fair Scheduler to assign more than 1 map
> > task to a TT per heart beat (I did that and achieved 100% utilization
> > even with small map tasks). But I am wondering, if doing so will
> > violate some fairness properties.
> >
> > Thanks,
> > Abhishek
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: cluster under-utilization with Hadoop Fair Scheduler

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Abhishek,

This behavior is improved by MAPREDUCE-706 I believe (not certain that
that's the JIRA, but I know it's fixed in trunk fairscheduler). These
patches are included in CDH3 (currently in beta)
http://archive.cloudera.com/cdh/3/

In general, though, map tasks that are so short are not going to be very
efficient - even with fast assignment there is some constant overhead per
task.

Thanks
-Todd

On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma <ab...@usc.edu> wrote:

> Hi all,
>
> I have been using the Hadoop Fair Scheduler for some experiments on a
> 100 node cluster with 2 map slots per node (hence, a total of 200 map
> slots).
>
> In one of my experiments, all the map tasks finish within a heartbeat
> interval of 3 seconds. I noticed that the maximum number of
> concurrently
> active map slots on my cluster never exceeds 100, and hence, the
> cluster utilization during my experiments never exceeds 50% even when
> large jobs with more than a 1000 maps are being executed.
>
> A look at the Fair Scheduler code (in particular, the assignTasks
> function) revealed the reason.
> As per my understanding, with the implementation in Hadoop 0.20.0, a
> TaskTracker is not assigned more than 1 map and 1 reduce task per
> heart beat.
>
> In my experiments, in every heart beat, each TT has 2 free map slots
> but is assigned only 1 map task, and hence, the utilization never goes
> beyond 50%.
>
> Of course, this (degenerate) case does not arise when map tasks take
> more than one 1 heart beat interval to finish. For example, I repeated
> the experiments with maps tasks taking close to 15 s to finish and
> noticed close to 100 % utilization when large jobs were executing.
>
> Why does the Fair Scheduler not assign more than one map task to a TT
> per heart beat? Is this done to spread the load uniformly across the
> cluster?
> I looked at assignTasks function in the default Hadoop scheduler
> (JobQueueTaskScheduler.java), and it does assign more than 1 map task
> per heart beat to a TT.
>
> It will be easy to change the Fair Scheduler to assign more than 1 map
> task to a TT per heart beat (I did that and achieved 100% utilization
> even with small map tasks). But I am wondering, if doing so will
> violate some fairness properties.
>
> Thanks,
> Abhishek
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: cluster under-utilization with Hadoop Fair Scheduler

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Abhishek,

This behavior is improved by MAPREDUCE-706 I believe (not certain that
that's the JIRA, but I know it's fixed in trunk fairscheduler). These
patches are included in CDH3 (currently in beta)
http://archive.cloudera.com/cdh/3/

In general, though, map tasks that are so short are not going to be very
efficient - even with fast assignment there is some constant overhead per
task.

Thanks
-Todd

On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma <ab...@usc.edu> wrote:

> Hi all,
>
> I have been using the Hadoop Fair Scheduler for some experiments on a
> 100 node cluster with 2 map slots per node (hence, a total of 200 map
> slots).
>
> In one of my experiments, all the map tasks finish within a heartbeat
> interval of 3 seconds. I noticed that the maximum number of
> concurrently
> active map slots on my cluster never exceeds 100, and hence, the
> cluster utilization during my experiments never exceeds 50% even when
> large jobs with more than a 1000 maps are being executed.
>
> A look at the Fair Scheduler code (in particular, the assignTasks
> function) revealed the reason.
> As per my understanding, with the implementation in Hadoop 0.20.0, a
> TaskTracker is not assigned more than 1 map and 1 reduce task per
> heart beat.
>
> In my experiments, in every heart beat, each TT has 2 free map slots
> but is assigned only 1 map task, and hence, the utilization never goes
> beyond 50%.
>
> Of course, this (degenerate) case does not arise when map tasks take
> more than one 1 heart beat interval to finish. For example, I repeated
> the experiments with maps tasks taking close to 15 s to finish and
> noticed close to 100 % utilization when large jobs were executing.
>
> Why does the Fair Scheduler not assign more than one map task to a TT
> per heart beat? Is this done to spread the load uniformly across the
> cluster?
> I looked at assignTasks function in the default Hadoop scheduler
> (JobQueueTaskScheduler.java), and it does assign more than 1 map task
> per heart beat to a TT.
>
> It will be easy to change the Fair Scheduler to assign more than 1 map
> task to a TT per heart beat (I did that and achieved 100% utilization
> even with small map tasks). But I am wondering, if doing so will
> violate some fairness properties.
>
> Thanks,
> Abhishek
>



-- 
Todd Lipcon
Software Engineer, Cloudera