You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Eric Payne (JIRA)" <ji...@apache.org> on 2016/02/12 14:35:18 UTC

[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144582#comment-15144582 ] 

Eric Payne commented on MAPREDUCE-5044:
---------------------------------------

Hi [~jira.shegalov]. I would like to see this functionality implemented. We occasionally see containers time out, and it would be good if users could have direct feedback in the form of a jstack to help them debug their applications.

I have been coming up to speed on the work that's already been committed in this area under YARN-445 and its children. IIUC, YARN-445 and its children put in place the infrastructure for a {{Client -> RM -> NM -> Container}} signal path. On the other hand, this JIRA (along with YARN-1515) implements an {{AM -> NM -> Container}} signal path and the ability to send multiple signals per call.

It seems that these pieces could possibly be split into separate JIRAs. Either way, I think that a lot of what has been done in this JIRA could be used to add the interface to {{ContainerManagementProtocol}} that would allow the AM to prompt the NM to signal the container to dump its stack prior to killing the container on a timeout.

Is there a possibility that this JIRA will move forward? Ideally, we would like it all ported back to 2.7. Please let me know if there's anything I can do.

> Have AM trigger jstack on task attempts that timeout before killing them
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack output via SIGQUIT before killing the task attempt.  This would be invaluable for helping users debug their hung tasks, especially if they do not have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)