You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Marko Bauhardt <mb...@media-style.com> on 2006/03/14 18:19:54 UTC
killing a local job never worked?
Hi developers.
How to kill a local running job?
I use hadoop localrunner integrated in a virtual machine.
The api allows to kill a local job but looks like this had never
worked as killing a distributed job works today.
The problem I see is that in the LocaljobRunner the stop method of
the private inner class job (what is a thread) is triggered.
But this method only kills the job but never kills the running map
and reduce tasks that still use job resources.
The result is when trigger kill job in a localJobRunner the tasks
throws tons for NPE's.
What people think about implementing a clean kill method instead
using Thread.stop in the inner class job of the LocaljobRunner?
Thanks for any comments.
Marko.
Re: killing a local job never worked?
Posted by Doug Cutting <cu...@apache.org>.
Marko Bauhardt wrote:
> How to kill a local running job?
>
> I use hadoop localrunner integrated in a virtual machine.
> The api allows to kill a local job but looks like this had never worked
> as killing a distributed job works today.
> The problem I see is that in the LocaljobRunner the stop method of the
> private inner class job (what is a thread) is triggered.
> But this method only kills the job but never kills the running map and
> reduce tasks that still use job resources.
The map and reduce tasks are run in the job thread, and so should be
killed by thread.stop() too, no? Unless the map or reduce have threads
of their own (like Nutch's Fetcher). In that case we should probably
fix the mapper or reducer to exit when the mapper or reducer parent
thread exits, no?
> The result is when trigger kill job in a localJobRunner the tasks
> throws tons for NPE's.
>
> What people think about implementing a clean kill method instead using
> Thread.stop in the inner class job of the LocaljobRunner?
In a distributed configuration, map and reduce tasks discover that
they're parent tasktracker has been killed when the ping() method throws
an exception. This causes the child JVM to exit, killing all its
threads. So we should make ping() and all other callback methods in
LocalJobRunner throw an exception once the job has been killed. But the
framework can have no knowledge of threads started by the mapper or
reducer. These threads must be somehow stopped by the mapper or reducer
itself when the job thread is killed. In a distributed configuration
this is automatic, since they're in a separate JVM that exits, but with
LocalJobRunner the mapper and reducer must become more responsible.
You might argue that we should support a cleaner shutdown protocol. But
we must be able to handle unclean shutdowns anyway, when, e.g., JVMs
exit unexpectedly. Since we need to be able to handle such unclean
shutdowns, there's really no point in also supporting a clean shutdown
too, it just adds complexity. So, in LocalJobRunner, I think
thread.stop() on the job thread is sufficient. Does that make sense?
Doug