You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "qiwei huang (Jira)" <ji...@apache.org> on 2019/09/09 10:11:00 UTC

[jira] [Created] (YARN-9823) NodeManager cannot get right ResourceTrack address in Federation mode

qiwei huang created YARN-9823:
---------------------------------

             Summary: NodeManager cannot get right ResourceTrack address in Federation mode
                 Key: YARN-9823
                 URL: https://issues.apache.org/jira/browse/YARN-9823
             Project: Hadoop YARN
          Issue Type: Bug
          Components: federation, nodemanager
    Affects Versions: 2.9.2
         Environment: h2. Hadoop:

Hadoop 2.9.2 (some line number may not be right because we have merged some 3.0+ patch)

Security with Kerberos

configure from [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/Federation.html]
h2. Java:

Java(TM) SE Runtime Environment (build 1.8.0_77-b03)

Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)

Kerberos:

 

 
            Reporter: qiwei huang


{{the NM will infinitely try to connect the wrong RM's resource tracker port}}
{quote}{{INFO [main:RetryInvocationHandler@411] - java.net.ConnectException: Call From standby.rm.server/10.122.138.139 to }}{{standby.rm.server}}{{:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ResourceTrackerPBClientImpl.registerNodeManager over dev1 after 19 failover attempts. Trying to failover after sleeping for 40497ms.}}
{quote}
 

{{After change *yarn.client.failover-proxy-provider* to *org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider*, the ** NodeManager cannot find the right ResourceTracker address:}}
{quote}{{getRMHAId:233, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getConfKeyForRMInstance:294, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getConfValueForRMInstance:302, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getConfValueForRMInstance:314, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getSocketAddr:3341, YarnConfiguration (org.apache.hadoop.yarn.conf)}}
{{getRMAddress:77, ServerRMProxy (org.apache.hadoop.yarn.server.api)}}
{{run:144, FederationRMFailoverProxyProvider$1 (org.apache.hadoop.yarn.server.federation.failover)}}
{{doPrivileged:-1, AccessController (java.security)}}
{{doAs:422, Subject (javax.security.auth)}}
{{doAs:1893, UserGroupInformation (org.apache.hadoop.security)}}
{{getProxyInternal:141, FederationRMFailoverProxyProvider (org.apache.hadoop.yarn.server.federation.failover)}}
{{performFailover:192, FederationRMFailoverProxyProvider (org.apache.hadoop.yarn.server.federation.failover)}}
{{failover:217, RetryInvocationHandler$ProxyDescriptor (org.apache.hadoop.io.retry)}}
{{processRetryInfo:149, RetryInvocationHandler$Call (org.apache.hadoop.io.retry)}}
{{processWaitTimeAndRetryInfo:142, RetryInvocationHandler$Call (org.apache.hadoop.io.retry)}}
{{invokeOnce:107, RetryInvocationHandler$Call (org.apache.hadoop.io.retry)}}
{{invoke:359, RetryInvocationHandler (org.apache.hadoop.io.retry)}}
{{registerNodeManager:-1, $Proxy85 (com.sun.proxy)}}
{{registerWithRM:378, NodeStatusUpdaterImpl (org.apache.hadoop.yarn.server.nodemanager)}}
{{serviceStart:252, NodeStatusUpdaterImpl (org.apache.hadoop.yarn.server.nodemanager)}}
{{start:194, AbstractService (org.apache.hadoop.service)}}
{{serviceStart:121, CompositeService (org.apache.hadoop.service)}}
{{start:194, AbstractService (org.apache.hadoop.service)}}
{{initAndStartNodeManager:864, NodeManager (org.apache.hadoop.yarn.server.nodemanager)}}
{{main:931, NodeManager (org.apache.hadoop.yarn.server.nodemanager)}}
{quote}
{{the Provider will try to find the main RM address on }}*{{getRMHAId:233,}}* {{but it cannot find the right address because it can just return the local Address: }}{{}}
{quote}{{if (!s.isUnresolved() && NetUtils.isLocalAddress(s.getAddress())) {}}
{{ currentRMId = rmId.trim();}}
{{ found++;}}
{{}}}
{quote}
{{If the NM and RM is on the same node, and the this RM is in standby situation, the NM will }}{{infinitely}}{{ call RPC to RM}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org