You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/04/01 18:03:19 UTC

[jira] [Commented] (YARN-1888) Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port

    [ https://issues.apache.org/jira/browse/YARN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956690#comment-13956690 ] 

Jason Lowe commented on YARN-1888:
----------------------------------

I agree with [~kasha] on this.  A nodemanager coming up on a different port isn't necessarily the same nodemanager from a previous instance.  For exampe, the minicluster runs multiple nodes on the same host with different ports, so if one of these nodes disappears then it will no longer be reported as lost with this patch since there are others still running with the same host?

I think the real fix is to run the nodemanager with a non-ephemeral nodemanager port specified in yarn-site.xml.  This helps solve a number of issues:

# lost nodes count will be accurate
# a NM that reboots and rejoins the cluster before the RM expires the old instance will be correctly recognized as the same NM, and we avoid the RM thinking there are really two NMs on the host for up to the NM expiry interval
# attempts to start a subsequent NM on the same host where an NM is already running will fail rather than accidentally overcommit the node

> Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-1888
>                 URL: https://issues.apache.org/jira/browse/YARN-1888
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: zhaoyunjiong
>            Priority: Minor
>         Attachments: YARN-1888.patch
>
>
> When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" inaccurate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)