You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/09/04 01:19:51 UTC

[jira] [Resolved] (YARN-2510) RM can drop container completion events

     [ https://issues.apache.org/jira/browse/YARN-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe resolved YARN-2510.
------------------------------
    Resolution: Invalid

My apologies, this is an invalid report.  I accidentally grabbed the wrong container ID when searching the RM log because after looking again I don't see the RM seeing the container completion event.  The 9 missing completion events on the AM were all from the same node, so I think this is a case of a poorly handled node failure that lead to a MapReduce app hang.

I'll file a separate JIRA to track handling that case better.  That's probably is a MapReduce fix since the RM can't tell the container is no longer needed unless either the NM reports it completing (which it failed to do in this case due to a bad node) or the AM explicitly releases the container.

> RM can drop container completion events
> ---------------------------------------
>
>                 Key: YARN-2510
>                 URL: https://issues.apache.org/jira/browse/YARN-2510
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Priority: Critical
>
> The RM can drop container completion events and fail to report them to the AM.  Details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)