You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Xu Cang (Jira)" <ji...@apache.org> on 2020/12/03 23:23:00 UTC
[jira] [Created] (YARN-10516) In HA mode, when one Resource Manager
has networking issue, getTokenService() should not throw runtime exception
Xu Cang created YARN-10516:
------------------------------
Summary: In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception
Key: YARN-10516
URL: https://issues.apache.org/jira/browse/YARN-10516
Project: Hadoop YARN
Issue Type: Improvement
Components: client
Reporter: Xu Cang
We have observed one issue from YARN client around this piece of code:
[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
While
{code:java}
services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, defaultAddr, defaultPort)) .toString());
{code}
Is being called, "yarnConf.getSocketAddr" will throw runtime exception, more specifically, UnknownHostException from here: [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
while one of the RM host was having networking issue that IP cannot be resolved.
This runtime exception then floats all the way into our application and cause MR job submission failed.
In my opinion, since we have HA here, multiple RMs are still alive and available. We should catch this exception in getTokenService() and handle it properly.
Would like to hear your opinion on this, if agreed, I will provide a patch on this. Thank you.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org