You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Eric Payne (JIRA)" <ji...@apache.org> on 2016/02/20 22:39:18 UTC

[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Payne updated MAPREDUCE-5044:
----------------------------------
    Attachment: MAPREDUCE-5044.v07.local.patch

Thanks, [~jira.shegalov] for all of the work already done on this JIRA.

I have upmerged the latest patch and integrated it with the {{SignalContainerRequest}} that was added as part of YARN-445 and its children.

[~mingma], [~xgong], [~jlowe], [~jira.shegalov], would you please take a look?

I would like to see functionality in this JIRA implemented. We occasionally see containers time out, and it would be good if users could have direct feedback in the form of a jstack to help them debug their applications.

IIUC, YARN-445 and its children put in place the infrastructure for a {{Client -> RM -> NM -> Container}} signal path. However, in order to automatically dump the jstack when a container times out, we still need an {{AM -> NM -> Container}} signal path. This JIRA (MAPREDUCE-5044 along with YARN-1515) adds this signal path along with the ability to send multiple signals per call.

I think sending multiple signals per call could be split into a separate JIRA.


> Have AM trigger jstack on task attempts that timeout before killing them
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack output via SIGQUIT before killing the task attempt.  This would be invaluable for helping users debug their hung tasks, especially if they do not have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)