You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Sreejith Ramakrishnan <sr...@gmail.com> on 2013/06/28 23:13:51 UTC

How to Free-up a Map Slot without Killing the Entire Job?

I'm trying to implement a scheduler (EDF). The scheduler should be able to
kill or free-up a running map slot so that it can be assigned as a map slot
to another job.

I did some looking around and found a kill() method in
org.apache.hadoop.mapred.JobInProgress. But, this kills the entire job. I
want the job to still be working after a map slot has been removed from it.

Can you guys tell me the right method/class to use?

Thanks

Re: How to Free-up a Map Slot without Killing the Entire Job?

Posted by Sreejith Ramakrishnan <sr...@gmail.com>.
Hey,

Thanks to everyone for the help.  It was helpful. I know it's late. But I
got so busy with coding the scheduler itself that I forgot to thank all of
you.

Special thanks to you, Kun Ling for taking the time to write the long and
detailed answer. It gave me place to start.

Sincerely,
Sreejith R
On 29-Jun-2013 7:13 AM, "Kun Ling" <lk...@gmail.com> wrote:

> Hi Sreejith,
>
>     the "bin/hadoop" script provide an option to kill the task by running
> "bin/hadoop job -kill-task <task-attempt-id>". It seems helpful to you.
>
>     Here is how the killTask works.
>
>    ##1. JobClient tell JobTracker which task to kill
>     1.1. JobClient will recognize this command, and call
> JobSubmissionProtocal.killTask(), it will ask JobTracker to kill the task.
>
>    ##2 JobTracker ask TaskTracker to kill the task.
>     2.1. JobTracker will firstly check whether the cluster is in safemode,
> and if the task is not in progress. If all is false, it will firstly check
> the permission of the current user, and then call
> TaskInProgress.killTask().
>
>     2.2. There is a TreeMap object tasksToKill, which  maintained by
> TaskInProgress to store the task that is need to kill.
>     2.3. JobTracker will use getTasksToKill() to get a killTasksList, and
> put them into the heartbeat actions, and sent it to taskTracker.
>
>     2.4. In TaskTracker, the offerService() loop will loop forever, and
> will get the HeartbeatResponse by calling transmitHeartBeat() method, and
> will process the response to get the action which JobTracker ask it to do,
> of course the kill Task action is in it.
>
>     2.5. Since the killTask action is not LaunchTaskAction and
> CommitTaskAction, it will be passed to the AddActionToCleanup(), and in
> it,the killTaskAction's actionId will be used to put into the
> allCleanupActions queue for process.
>
>     2.6. The TaskCleanupThread in TaskTracker will try to run
> taskCleanup(), this method will call processKillTaskAction(), finally this
> methods will call kill() method of the TaskInProgress Object, which will
> turn the state of the task from RUNNING to KILLED_UNCLEAN, and it also ask
> directoryCleanupThread to cleanup the directory and release the slot, and
> finally notify the JobTracker using heartbeat.
>
>   ## 3. After JobTracker knows that TaskTracker have killed the task, it
> will ask the taskTracker to run clean-up task. It will remove the
>
>    3.1.  JobTracker will get the KILLED_UNCLEAN status of the Task attempt,
> and change the type of the task to task-cleanup task, and put the task in
> the mapCleanupTasks or reduceCleanupTasks in JobInProcess object according
> to the original task type. And the Tasklist will be passed to the
> TaskTracker using heartbeat.
>
>    3.2.  TaskTracker will Run the cleanup task, cleanup the temporary files
> generated by the killed task attempt, and change the status of the cleanup
> task to SUCCESSED, and report to JobTracker using heartbeat.
>
>    3.3 JobTracker get the heartbeat, and knows that the task have been
> killed.
>
>
> yours,
> Kun Ling
>
> http://cn.linkedin.com/pub/kun-ling/20/3/515
>
>
>
>
> On Sat, Jun 29, 2013 at 5:13 AM, Sreejith Ramakrishnan <
> sreejith.code@gmail.com> wrote:
>
> > I'm trying to implement a scheduler (EDF). The scheduler should be able
> to
> > kill or free-up a running map slot so that it can be assigned as a map
> slot
> > to another job.
> >
> > I did some looking around and found a kill() method in
> > org.apache.hadoop.mapred.JobInProgress. But, this kills the entire job. I
> > want the job to still be working after a map slot has been removed from
> it.
> >
> > Can you guys tell me the right method/class to use?
> >
> > Thanks
> >
>
>
>
> --
> http://www.lingcc.com
>

Re: How to Free-up a Map Slot without Killing the Entire Job?

Posted by Kun Ling <lk...@gmail.com>.
Hi Sreejith,

    the "bin/hadoop" script provide an option to kill the task by running
"bin/hadoop job -kill-task <task-attempt-id>". It seems helpful to you.

    Here is how the killTask works.

   ##1. JobClient tell JobTracker which task to kill
    1.1. JobClient will recognize this command, and call
JobSubmissionProtocal.killTask(), it will ask JobTracker to kill the task.

   ##2 JobTracker ask TaskTracker to kill the task.
    2.1. JobTracker will firstly check whether the cluster is in safemode,
and if the task is not in progress. If all is false, it will firstly check
the permission of the current user, and then call TaskInProgress.killTask().

    2.2. There is a TreeMap object tasksToKill, which  maintained by
TaskInProgress to store the task that is need to kill.
    2.3. JobTracker will use getTasksToKill() to get a killTasksList, and
put them into the heartbeat actions, and sent it to taskTracker.

    2.4. In TaskTracker, the offerService() loop will loop forever, and
will get the HeartbeatResponse by calling transmitHeartBeat() method, and
will process the response to get the action which JobTracker ask it to do,
of course the kill Task action is in it.

    2.5. Since the killTask action is not LaunchTaskAction and
CommitTaskAction, it will be passed to the AddActionToCleanup(), and in
it,the killTaskAction's actionId will be used to put into the
allCleanupActions queue for process.

    2.6. The TaskCleanupThread in TaskTracker will try to run
taskCleanup(), this method will call processKillTaskAction(), finally this
methods will call kill() method of the TaskInProgress Object, which will
turn the state of the task from RUNNING to KILLED_UNCLEAN, and it also ask
directoryCleanupThread to cleanup the directory and release the slot, and
finally notify the JobTracker using heartbeat.

  ## 3. After JobTracker knows that TaskTracker have killed the task, it
will ask the taskTracker to run clean-up task. It will remove the

   3.1.  JobTracker will get the KILLED_UNCLEAN status of the Task attempt,
and change the type of the task to task-cleanup task, and put the task in
the mapCleanupTasks or reduceCleanupTasks in JobInProcess object according
to the original task type. And the Tasklist will be passed to the
TaskTracker using heartbeat.

   3.2.  TaskTracker will Run the cleanup task, cleanup the temporary files
generated by the killed task attempt, and change the status of the cleanup
task to SUCCESSED, and report to JobTracker using heartbeat.

   3.3 JobTracker get the heartbeat, and knows that the task have been
killed.


yours,
Kun Ling

http://cn.linkedin.com/pub/kun-ling/20/3/515




On Sat, Jun 29, 2013 at 5:13 AM, Sreejith Ramakrishnan <
sreejith.code@gmail.com> wrote:

> I'm trying to implement a scheduler (EDF). The scheduler should be able to
> kill or free-up a running map slot so that it can be assigned as a map slot
> to another job.
>
> I did some looking around and found a kill() method in
> org.apache.hadoop.mapred.JobInProgress. But, this kills the entire job. I
> want the job to still be working after a map slot has been removed from it.
>
> Can you guys tell me the right method/class to use?
>
> Thanks
>



-- 
http://www.lingcc.com

Re: How to Free-up a Map Slot without Killing the Entire Job?

Posted by Sandy Ryza <sa...@cloudera.com>.
The fair scheduler does preemption as well.

-Sandy


On Sat, Jun 29, 2013 at 12:05 PM, Steve Loughran <st...@hortonworks.com>wrote:

> there's a scheduler in contrib/ that does pre-emption. look here
>
>
> https://github.com/apache/hadoop-common/tree/branch-0.22/mapreduce/src/contrib/dynamic-scheduler
>
> On 28 June 2013 22:13, Sreejith Ramakrishnan <sreejith.code@gmail.com
> >wrote:
>
> > I'm trying to implement a scheduler (EDF). The scheduler should be able
> to
> > kill or free-up a running map slot so that it can be assigned as a map
> slot
> > to another job.
> >
> > I did some looking around and found a kill() method in
> > org.apache.hadoop.mapred.JobInProgress. But, this kills the entire job. I
> > want the job to still be working after a map slot has been removed from
> it.
> >
> > Can you guys tell me the right method/class to use?
> >
> > Thanks
> >
>

Re: How to Free-up a Map Slot without Killing the Entire Job?

Posted by Steve Loughran <st...@hortonworks.com>.
there's a scheduler in contrib/ that does pre-emption. look here

https://github.com/apache/hadoop-common/tree/branch-0.22/mapreduce/src/contrib/dynamic-scheduler

On 28 June 2013 22:13, Sreejith Ramakrishnan <sr...@gmail.com>wrote:

> I'm trying to implement a scheduler (EDF). The scheduler should be able to
> kill or free-up a running map slot so that it can be assigned as a map slot
> to another job.
>
> I did some looking around and found a kill() method in
> org.apache.hadoop.mapred.JobInProgress. But, this kills the entire job. I
> want the job to still be working after a map slot has been removed from it.
>
> Can you guys tell me the right method/class to use?
>
> Thanks
>