You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/09/04 01:19:51 UTC
[jira] [Resolved] (YARN-2510) RM can drop container completion
events
[ https://issues.apache.org/jira/browse/YARN-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Lowe resolved YARN-2510.
------------------------------
Resolution: Invalid
My apologies, this is an invalid report. I accidentally grabbed the wrong container ID when searching the RM log because after looking again I don't see the RM seeing the container completion event. The 9 missing completion events on the AM were all from the same node, so I think this is a case of a poorly handled node failure that lead to a MapReduce app hang.
I'll file a separate JIRA to track handling that case better. That's probably is a MapReduce fix since the RM can't tell the container is no longer needed unless either the NM reports it completing (which it failed to do in this case due to a bad node) or the AM explicitly releases the container.
> RM can drop container completion events
> ---------------------------------------
>
> Key: YARN-2510
> URL: https://issues.apache.org/jira/browse/YARN-2510
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.5.0
> Reporter: Jason Lowe
> Priority: Critical
>
> The RM can drop container completion events and fail to report them to the AM. Details in the first comment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)