You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2008/08/22 00:28:44 UTC

[jira] Commented: (HADOOP-3994) There is little information provided when the TaskTracker kills a Task that has not reported within the timeout (600 sec) interval - this patch provides a stack trace of the task

    [ https://issues.apache.org/jira/browse/HADOOP-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624500#action_12624500 ] 

Steve Loughran commented on HADOOP-3994:
----------------------------------------

This could be really useful; anything to get the PID of a forked process would be handy. As you note, UNIXProcess is undocumented and only likely to surface on sun-derived JVMs; the other risk is instability of their private code. But it would be useful, in other places in the apache portfolio.

* all code to deal with this class should be outside TaskRunner; a separate class for use on demand, 
* the class should include a condition that warns that that the operation is going to work 
* To test, fork a process that Sleeps for 30s or so, and before that sleep has finished, try to get a stack dump. 
* I could imagine a kill() method being useful too.



> There is little information provided when the TaskTracker kills a Task that has not reported within the timeout (600 sec) interval - this patch provides a stack trace of the task 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3994
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3994
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Jason
>            Priority: Minor
>         Attachments: 0.16_patch
>
>
> When we have a task that is killed for not reporting, sometimes there is an obvious programming error, and sometimes the reason the job didn't report is unclear.
> This patch will cause the TaskTracker to try to generate a stack trace of the offending task before the task is killed.
> Given how opaque process control is in java, a program is run to generate the stack trace, using the PID extracted from the undocumented UNIXProcess class
> The attached patch is against 0.16.0, as that is the release we use.
> This will only work on Unix machines -- or JVM's what use the java.lang.UNIXProcess implementation for the java Process object.
> The script that generates the stack trace is very linux specific.
> The code changes will run on jvm's where the UNIXProcess class is not available, without failure, but no stack trace will be generated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.