You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/04/13 23:41:01 UTC

[jira] Commented: (HADOOP-133) the TaskTracker.Child.ping thread calls exit

    [ http://issues.apache.org/jira/browse/HADOOP-133?page=comments#action_12374421 ] 

Doug Cutting commented on HADOOP-133:
-------------------------------------

We can't always rely on cleanup/finally stuff to run.  JVMs can exit unexpectedly.  We hope it doesn't happen often, but we must be able to handle that situation.  If we need to, e.g., clean up temp files, we do that on startup.

The reason this was added was to handle the case where the tasktracker has exited and the child is somehow hung.  We must not leave stray, hung, JVMs around.  Thread.interrupt() is not reliable enough.  When a thread is hung, it will not recieve an interrupt.  I've seen this frequently when fetching, where socket read()  requests hang indefinitely, despite the socket having a short read timeout.

So I'd be happy to have this first try to exit more gracefully, but, after a time, it should still call exit().  The child processes do not have a pid file.  Once their parent has died, nothing tracks them, so they must reliably exit fairly quickly when their parent dies.

> the TaskTracker.Child.ping thread calls exit
> --------------------------------------------
>
>          Key: HADOOP-133
>          URL: http://issues.apache.org/jira/browse/HADOOP-133
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley

>
> The TaskTracker.Child.startPinging thread calls exit if the TaskTracker doesn't respond. Calling exit in a mutli-threaded program is really problematic. In particular, it prevents cleanup/finally clauses from running. We need to move to a model where it uses Thread.interrupt(), which means we need to check the interrupt flag in place in the map loop and reduce loop and stop masking the InterruptExceptions.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira