You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2009/11/07 02:39:41 UTC

[jira] Commented: (MAPREDUCE-1119) When tasks fail to report status, show tasks's stack dump before killing

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774531#action_12774531 ] 

Todd Lipcon commented on MAPREDUCE-1119:
----------------------------------------

- I'd prefer SIGQUIT_TASK_JVM rather than QUIT_TASK_JVM for clarity's sake. It's a little less consistent, but more obvious for people reading the code later on.
- in destroyTaskJVM, you sleep for sleeptimeBeforeSigkill in between the SIGQUIT and the SIGKILL. This seems wrong - it should probably be a different timeout.
- ProcessTree.java: move SIGQUIT constant to top of class
- ProcessTree.java: some whitespace indentation seems wrong for sigQuitProcess
- Can we refactor some of the various signal-sending code in ProcessTree to share common code? Lots of very similar methods.
- This currently causes stack traces for *all* killed tasks, right? I don't personally have a problem with that, but the description of the JIRA indicates that only those due to failing to report status will dump their stack, and it's worth noting the difference.
- LinuxTaskController.finishTask is now sort of a misnomer, since you're using it to send SIGQUIT. Maybe rename to sendKillSignal or something?

> When tasks fail to report status, show tasks's stack dump before killing
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1119
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.patch
>
>
> When the TT kills tasks that haven't reported status, it should somehow gather a stack dump for the task. This could be done either by sending a SIGQUIT (so the dump ends up in stdout) or perhaps something like JDI to gather the stack directly from Java. This may be somewhat tricky since the child may be running as another user (so the SIGQUIT would have to go through LinuxTaskController). This feature would make debugging these kinds of failures much easier, especially if we could somehow get it into the TaskDiagnostic message

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.