You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by He Chen <ai...@gmail.com> on 2011/01/15 06:45:56 UTC
Question about Hadoop Default FCFS Job Scheduler
Hey all
Why does the FCFS scheduler only let a node chooses one task at a time in
one job? In order to increase the data locality,
it is reasonable to let a node to choose all its local tasks (if it can)
from a job at a time.
Any reply will be appreciated.
Thanks
Chen
Re: Question about Hadoop Default FCFS Job Scheduler
Posted by Nan Zhu <zh...@gmail.com>.
OK, I got your point,
you mean why don't we put the for loop into obtainNewLocalMapTask(),
yes, I think we can do that, but the result is the same with current codes,
and I don't think it will lead too many benefits on performance, and
personally, I like the current style, :-)
Best,
Nan
On Tue, Jan 18, 2011 at 12:24 AM, He Chen <ai...@gmail.com> wrote:
> Hi Nan,
>
> Thank you for the reply. I understand what you mean. What I concern is
> inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a
> time.
>
> Now I understand why it only assigns one task at a time. It is because the
> outside loop:
>
> for (i = 0; i < MapperCapacity; ++i){
>
> (......)
>
> }
>
> I mean why this loop exists here. Why does the scheduler use this type of
> loop. It imposes overhead to the task assigning process if only assign one
> task at a time. It is obviously that a node can be assigned all available
> local tasks it can in one "afford obtainNewLocalMapTask(......)" method
> call.
>
> Bests
>
> Chen
>
> On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu <zh...@gmail.com> wrote:
>
> > Hi, Chen
> >
> > How is it going recently?
> >
> > Actually I think you misundertand the code in assignTasks() in
> > JobQueueTaskScheduler.java, see the following structure of the
> interesting
> > codes:
> >
> > //I'm sorry, I hacked the code so much, the name of the variables may be
> > different from the original version
> >
> > for (i = 0; i < MapperCapacity; ++i){
> > ...
> > for (JobInProgress job:jobQueue){
> > //try to shedule a node-local or rack-local map tasks
> > //here is the interesting place
> > t = job.obtainNewLocalMapTask(...);
> > if (t != null){
> > ...
> > break;//the break statement here will make the control flow back
> > to "for (job:jobQueue)" which means that it will restart map tasks
> > selection
> > procedure from the first job, so , it is actually schedule all of the
> first
> > job's local mappers first until the map slots are full
> > }
> > }
> > }
> >
> > BTW, we can only schedule a reduce task in a single heartbeat
> >
> >
> >
> > Best,
> > Nan
> > On Sat, Jan 15, 2011 at 1:45 PM, He Chen <ai...@gmail.com> wrote:
> >
> > > Hey all
> > >
> > > Why does the FCFS scheduler only let a node chooses one task at a time
> in
> > > one job? In order to increase the data locality,
> > > it is reasonable to let a node to choose all its local tasks (if it
> can)
> > > from a job at a time.
> > >
> > > Any reply will be appreciated.
> > >
> > > Thanks
> > >
> > > Chen
> > >
> >
>
Re: Question about Hadoop Default FCFS Job Scheduler
Posted by Nan Zhu <zh...@gmail.com>.
Hi, Chen
Actually not one task each time,
see this statement:
assignedTasks.add(t);
assignedTasks is the return value of this method, and it's a collection of
selected tasks, it will contain multiple tasks if the candidates are there..
Best,
Nan
On Tue, Jan 18, 2011 at 12:24 AM, He Chen <ai...@gmail.com> wrote:
> Hi Nan,
>
> Thank you for the reply. I understand what you mean. What I concern is
> inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a
> time.
>
> Now I understand why it only assigns one task at a time. It is because the
> outside loop:
>
> for (i = 0; i < MapperCapacity; ++i){
>
> (......)
>
> }
>
> I mean why this loop exists here. Why does the scheduler use this type of
> loop. It imposes overhead to the task assigning process if only assign one
> task at a time. It is obviously that a node can be assigned all available
> local tasks it can in one "afford obtainNewLocalMapTask(......)" method
> call.
>
> Bests
>
> Chen
>
> On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu <zh...@gmail.com> wrote:
>
> > Hi, Chen
> >
> > How is it going recently?
> >
> > Actually I think you misundertand the code in assignTasks() in
> > JobQueueTaskScheduler.java, see the following structure of the
> interesting
> > codes:
> >
> > //I'm sorry, I hacked the code so much, the name of the variables may be
> > different from the original version
> >
> > for (i = 0; i < MapperCapacity; ++i){
> > ...
> > for (JobInProgress job:jobQueue){
> > //try to shedule a node-local or rack-local map tasks
> > //here is the interesting place
> > t = job.obtainNewLocalMapTask(...);
> > if (t != null){
> > ...
> > break;//the break statement here will make the control flow back
> > to "for (job:jobQueue)" which means that it will restart map tasks
> > selection
> > procedure from the first job, so , it is actually schedule all of the
> first
> > job's local mappers first until the map slots are full
> > }
> > }
> > }
> >
> > BTW, we can only schedule a reduce task in a single heartbeat
> >
> >
> >
> > Best,
> > Nan
> > On Sat, Jan 15, 2011 at 1:45 PM, He Chen <ai...@gmail.com> wrote:
> >
> > > Hey all
> > >
> > > Why does the FCFS scheduler only let a node chooses one task at a time
> in
> > > one job? In order to increase the data locality,
> > > it is reasonable to let a node to choose all its local tasks (if it
> can)
> > > from a job at a time.
> > >
> > > Any reply will be appreciated.
> > >
> > > Thanks
> > >
> > > Chen
> > >
> >
>
Re: Question about Hadoop Default FCFS Job Scheduler
Posted by He Chen <ai...@gmail.com>.
Hi Nan,
Thank you for the reply. I understand what you mean. What I concern is
inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a
time.
Now I understand why it only assigns one task at a time. It is because the
outside loop:
for (i = 0; i < MapperCapacity; ++i){
(......)
}
I mean why this loop exists here. Why does the scheduler use this type of
loop. It imposes overhead to the task assigning process if only assign one
task at a time. It is obviously that a node can be assigned all available
local tasks it can in one "afford obtainNewLocalMapTask(......)" method
call.
Bests
Chen
On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu <zh...@gmail.com> wrote:
> Hi, Chen
>
> How is it going recently?
>
> Actually I think you misundertand the code in assignTasks() in
> JobQueueTaskScheduler.java, see the following structure of the interesting
> codes:
>
> //I'm sorry, I hacked the code so much, the name of the variables may be
> different from the original version
>
> for (i = 0; i < MapperCapacity; ++i){
> ...
> for (JobInProgress job:jobQueue){
> //try to shedule a node-local or rack-local map tasks
> //here is the interesting place
> t = job.obtainNewLocalMapTask(...);
> if (t != null){
> ...
> break;//the break statement here will make the control flow back
> to "for (job:jobQueue)" which means that it will restart map tasks
> selection
> procedure from the first job, so , it is actually schedule all of the first
> job's local mappers first until the map slots are full
> }
> }
> }
>
> BTW, we can only schedule a reduce task in a single heartbeat
>
>
>
> Best,
> Nan
> On Sat, Jan 15, 2011 at 1:45 PM, He Chen <ai...@gmail.com> wrote:
>
> > Hey all
> >
> > Why does the FCFS scheduler only let a node chooses one task at a time in
> > one job? In order to increase the data locality,
> > it is reasonable to let a node to choose all its local tasks (if it can)
> > from a job at a time.
> >
> > Any reply will be appreciated.
> >
> > Thanks
> >
> > Chen
> >
>
Re: Question about Hadoop Default FCFS Job Scheduler
Posted by Nan Zhu <zh...@gmail.com>.
Hi, Chen
How is it going recently?
Actually I think you misundertand the code in assignTasks() in
JobQueueTaskScheduler.java, see the following structure of the interesting
codes:
//I'm sorry, I hacked the code so much, the name of the variables may be
different from the original version
for (i = 0; i < MapperCapacity; ++i){
...
for (JobInProgress job:jobQueue){
//try to shedule a node-local or rack-local map tasks
//here is the interesting place
t = job.obtainNewLocalMapTask(...);
if (t != null){
...
break;//the break statement here will make the control flow back
to "for (job:jobQueue)" which means that it will restart map tasks selection
procedure from the first job, so , it is actually schedule all of the first
job's local mappers first until the map slots are full
}
}
}
BTW, we can only schedule a reduce task in a single heartbeat
Best,
Nan
On Sat, Jan 15, 2011 at 1:45 PM, He Chen <ai...@gmail.com> wrote:
> Hey all
>
> Why does the FCFS scheduler only let a node chooses one task at a time in
> one job? In order to increase the data locality,
> it is reasonable to let a node to choose all its local tasks (if it can)
> from a job at a time.
>
> Any reply will be appreciated.
>
> Thanks
>
> Chen
>