You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by He Chen <ai...@gmail.com> on 2011/01/15 06:45:56 UTC

Question about Hadoop Default FCFS Job Scheduler

Hey all

Why does the FCFS scheduler only let a node chooses one task at a time in
one job? In order to increase the data locality,
it is reasonable to let a node to choose all its local tasks (if it can)
from a job at a time.

Any reply will be appreciated.

Thanks

Chen

Re: Question about Hadoop Default FCFS Job Scheduler

Posted by Nan Zhu <zh...@gmail.com>.
OK, I got your point,

you mean why don't we put the for loop into obtainNewLocalMapTask(),

yes, I think we can do that, but the result is the same with current codes,
and I don't think it will lead too many benefits on performance, and
personally, I like the current style, :-)

Best,

Nan

On Tue, Jan 18, 2011 at 12:24 AM, He Chen <ai...@gmail.com> wrote:

> Hi Nan,
>
> Thank you for the reply. I understand what you mean. What I concern is
> inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a
> time.
>
> Now I understand why it only assigns one task at a time. It is because the
> outside loop:
>
> for (i = 0; i < MapperCapacity; ++i){
>
> (......)
>
> }
>
> I mean why this loop exists here. Why does the scheduler use this type of
> loop. It imposes overhead to the task assigning process if only assign one
> task at a time. It is obviously that a node can be assigned all available
> local tasks it can in one "afford obtainNewLocalMapTask(......)" method
> call.
>
> Bests
>
> Chen
>
> On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu <zh...@gmail.com> wrote:
>
> > Hi, Chen
> >
> > How is it going recently?
> >
> > Actually I think you misundertand the code in assignTasks() in
> > JobQueueTaskScheduler.java, see the following structure of the
> interesting
> > codes:
> >
> > //I'm sorry, I hacked the code so much, the name of the variables may be
> > different from the original version
> >
> > for (i = 0; i < MapperCapacity; ++i){
> >   ...
> >   for (JobInProgress job:jobQueue){
> >       //try to shedule a node-local or rack-local map tasks
> >       //here is the interesting place
> >       t = job.obtainNewLocalMapTask(...);
> >       if (t != null){
> >          ...
> >          break;//the break statement here will make the control flow back
> > to "for (job:jobQueue)" which means that it will restart map tasks
> > selection
> > procedure from the first job, so , it is actually schedule all of the
> first
> > job's local mappers first until the map slots are full
> >       }
> >   }
> > }
> >
> > BTW, we can only schedule a reduce task in a single heartbeat
> >
> >
> >
> > Best,
> > Nan
> > On Sat, Jan 15, 2011 at 1:45 PM, He Chen <ai...@gmail.com> wrote:
> >
> > > Hey all
> > >
> > > Why does the FCFS scheduler only let a node chooses one task at a time
> in
> > > one job? In order to increase the data locality,
> > > it is reasonable to let a node to choose all its local tasks (if it
> can)
> > > from a job at a time.
> > >
> > > Any reply will be appreciated.
> > >
> > > Thanks
> > >
> > > Chen
> > >
> >
>

Re: Question about Hadoop Default FCFS Job Scheduler

Posted by Nan Zhu <zh...@gmail.com>.
Hi, Chen

Actually not one task each time,

see this statement:

 assignedTasks.add(t);

assignedTasks is the return value of this method, and it's a collection of
selected tasks, it will contain multiple tasks if the candidates are there..

Best,

Nan

On Tue, Jan 18, 2011 at 12:24 AM, He Chen <ai...@gmail.com> wrote:

> Hi Nan,
>
> Thank you for the reply. I understand what you mean. What I concern is
> inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a
> time.
>
> Now I understand why it only assigns one task at a time. It is because the
> outside loop:
>
> for (i = 0; i < MapperCapacity; ++i){
>
> (......)
>
> }
>
> I mean why this loop exists here. Why does the scheduler use this type of
> loop. It imposes overhead to the task assigning process if only assign one
> task at a time. It is obviously that a node can be assigned all available
> local tasks it can in one "afford obtainNewLocalMapTask(......)" method
> call.
>
> Bests
>
> Chen
>
> On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu <zh...@gmail.com> wrote:
>
> > Hi, Chen
> >
> > How is it going recently?
> >
> > Actually I think you misundertand the code in assignTasks() in
> > JobQueueTaskScheduler.java, see the following structure of the
> interesting
> > codes:
> >
> > //I'm sorry, I hacked the code so much, the name of the variables may be
> > different from the original version
> >
> > for (i = 0; i < MapperCapacity; ++i){
> >   ...
> >   for (JobInProgress job:jobQueue){
> >       //try to shedule a node-local or rack-local map tasks
> >       //here is the interesting place
> >       t = job.obtainNewLocalMapTask(...);
> >       if (t != null){
> >          ...
> >          break;//the break statement here will make the control flow back
> > to "for (job:jobQueue)" which means that it will restart map tasks
> > selection
> > procedure from the first job, so , it is actually schedule all of the
> first
> > job's local mappers first until the map slots are full
> >       }
> >   }
> > }
> >
> > BTW, we can only schedule a reduce task in a single heartbeat
> >
> >
> >
> > Best,
> > Nan
> > On Sat, Jan 15, 2011 at 1:45 PM, He Chen <ai...@gmail.com> wrote:
> >
> > > Hey all
> > >
> > > Why does the FCFS scheduler only let a node chooses one task at a time
> in
> > > one job? In order to increase the data locality,
> > > it is reasonable to let a node to choose all its local tasks (if it
> can)
> > > from a job at a time.
> > >
> > > Any reply will be appreciated.
> > >
> > > Thanks
> > >
> > > Chen
> > >
> >
>

Re: Question about Hadoop Default FCFS Job Scheduler

Posted by He Chen <ai...@gmail.com>.
Hi Nan,

Thank you for the reply. I understand what you mean. What I concern is
inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a
time.

Now I understand why it only assigns one task at a time. It is because the
outside loop:

for (i = 0; i < MapperCapacity; ++i){

(......)

}

I mean why this loop exists here. Why does the scheduler use this type of
loop. It imposes overhead to the task assigning process if only assign one
task at a time. It is obviously that a node can be assigned all available
local tasks it can in one "afford obtainNewLocalMapTask(......)" method
call.

Bests

Chen

On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu <zh...@gmail.com> wrote:

> Hi, Chen
>
> How is it going recently?
>
> Actually I think you misundertand the code in assignTasks() in
> JobQueueTaskScheduler.java, see the following structure of the interesting
> codes:
>
> //I'm sorry, I hacked the code so much, the name of the variables may be
> different from the original version
>
> for (i = 0; i < MapperCapacity; ++i){
>   ...
>   for (JobInProgress job:jobQueue){
>       //try to shedule a node-local or rack-local map tasks
>       //here is the interesting place
>       t = job.obtainNewLocalMapTask(...);
>       if (t != null){
>          ...
>          break;//the break statement here will make the control flow back
> to "for (job:jobQueue)" which means that it will restart map tasks
> selection
> procedure from the first job, so , it is actually schedule all of the first
> job's local mappers first until the map slots are full
>       }
>   }
> }
>
> BTW, we can only schedule a reduce task in a single heartbeat
>
>
>
> Best,
> Nan
> On Sat, Jan 15, 2011 at 1:45 PM, He Chen <ai...@gmail.com> wrote:
>
> > Hey all
> >
> > Why does the FCFS scheduler only let a node chooses one task at a time in
> > one job? In order to increase the data locality,
> > it is reasonable to let a node to choose all its local tasks (if it can)
> > from a job at a time.
> >
> > Any reply will be appreciated.
> >
> > Thanks
> >
> > Chen
> >
>

Re: Question about Hadoop Default FCFS Job Scheduler

Posted by Nan Zhu <zh...@gmail.com>.
Hi, Chen

How is it going recently?

Actually I think you misundertand the code in assignTasks() in
JobQueueTaskScheduler.java, see the following structure of the interesting
codes:

//I'm sorry, I hacked the code so much, the name of the variables may be
different from the original version

for (i = 0; i < MapperCapacity; ++i){
   ...
   for (JobInProgress job:jobQueue){
       //try to shedule a node-local or rack-local map tasks
       //here is the interesting place
       t = job.obtainNewLocalMapTask(...);
       if (t != null){
          ...
          break;//the break statement here will make the control flow back
to "for (job:jobQueue)" which means that it will restart map tasks selection
procedure from the first job, so , it is actually schedule all of the first
job's local mappers first until the map slots are full
       }
   }
}

BTW, we can only schedule a reduce task in a single heartbeat



Best,
Nan
On Sat, Jan 15, 2011 at 1:45 PM, He Chen <ai...@gmail.com> wrote:

> Hey all
>
> Why does the FCFS scheduler only let a node chooses one task at a time in
> one job? In order to increase the data locality,
> it is reasonable to let a node to choose all its local tasks (if it can)
> from a job at a time.
>
> Any reply will be appreciated.
>
> Thanks
>
> Chen
>