You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Sangjin Lee (JIRA)" <ji...@apache.org> on 2015/11/21 00:52:11 UTC

[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

    [ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15019099#comment-15019099 ] 

Sangjin Lee commented on YARN-4180:
-----------------------------------

Does this issue exist in 2.6.x? Should this be backported to branch-2.6?

> AMLauncher does not retry on failures when talking to NM 
> ---------------------------------------------------------
>
>                 Key: YARN-4180
>                 URL: https://issues.apache.org/jira/browse/YARN-4180
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Anubhav Dhoot
>            Assignee: Anubhav Dhoot
>            Priority: Critical
>             Fix For: 2.7.2
>
>         Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting and we get exceptions like NMNotReadyException. While YARN-3842 added retry for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing there intermittent errors to cause job failures. This can manifest during rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)