You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Marko Bauhardt <mb...@media-style.com> on 2006/03/14 18:19:54 UTC

killing a local job never worked?

Hi developers.
How to kill a local running job?

I use hadoop localrunner integrated in a virtual machine.
The api allows to kill a local job but looks like this had never  
worked as killing a distributed job works today.
The problem I see is that in the LocaljobRunner the stop method of  
the private inner class job (what is a thread) is triggered.
But this method only kills the job but never kills the running map  
and reduce tasks that still use job resources.
The result is when trigger kill job in a localJobRunner the tasks  
throws tons for NPE's.

What people think about implementing a  clean kill method instead  
using Thread.stop in the inner class job of the LocaljobRunner?

Thanks for any comments.
Marko.

Re: killing a local job never worked?

Posted by Doug Cutting <cu...@apache.org>.

Marko Bauhardt wrote:
> How to kill a local running job?
> 
> I use hadoop localrunner integrated in a virtual machine.
> The api allows to kill a local job but looks like this had never  worked 
> as killing a distributed job works today.
> The problem I see is that in the LocaljobRunner the stop method of  the 
> private inner class job (what is a thread) is triggered.
> But this method only kills the job but never kills the running map  and 
> reduce tasks that still use job resources.

The map and reduce tasks are run in the job thread, and so should be 
killed by thread.stop() too, no?  Unless the map or reduce have threads 
of their own (like Nutch's Fetcher).  In that case we should probably 
fix the mapper or reducer to exit when the mapper or reducer parent 
thread exits, no?

> The result is when trigger kill job in a localJobRunner the tasks  
> throws tons for NPE's.
> 
> What people think about implementing a  clean kill method instead  using 
> Thread.stop in the inner class job of the LocaljobRunner?

In a distributed configuration, map and reduce tasks discover that 
they're parent tasktracker has been killed when the ping() method throws 
an exception.  This causes the child JVM to exit, killing all its 
threads.  So we should make ping() and all other callback methods in 
LocalJobRunner throw an exception once the job has been killed.  But the 
framework can have no knowledge of threads started by the mapper or 
reducer.  These threads must be somehow stopped by the mapper or reducer 
itself when the job thread is killed.  In a distributed configuration 
this is automatic, since they're in a separate JVM that exits, but with 
LocalJobRunner the mapper and reducer must become more responsible.

You might argue that we should support a cleaner shutdown protocol.  But 
we must be able to handle unclean shutdowns anyway, when, e.g., JVMs 
exit unexpectedly.  Since we need to be able to handle such unclean 
shutdowns, there's really no point in also supporting a clean shutdown 
too, it just adds complexity.  So, in LocalJobRunner, I think 
thread.stop() on the job thread is sufficient.  Does that make sense?

Doug