You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Leon zhang (JIRA)" <ji...@apache.org> on 2019/03/18 23:40:00 UTC
[jira] [Created] (HDFS-14376) Yarn Client may use stale DNS to
connect to RM
Leon zhang created HDFS-14376:
---------------------------------
Summary: Yarn Client may use stale DNS to connect to RM
Key: HDFS-14376
URL: https://issues.apache.org/jira/browse/HDFS-14376
Project: Hadoop HDFS
Issue Type: Bug
Components: caching
Affects Versions: 2.9.1
Reporter: Leon zhang
This happens more frequently when running yarn in Kubernetes. When yarn client try to connect to RM, if the DNS of RM is not resovable due to kube-dns failure or not ready, the yarn client will initaize itself with unresoved InetSocketAddress in RMProxy#newProxyInstance(). The connect to RM will fail with UnknownHostException. Yarn client will retry the connection by RetryProxy by it always use the cached unresolved InetSocketAddress. The retry will never success. When RM is reschdured to another kubernetes node, which changed the RM ip, this bug will also happen. Currently the work around is to restarting the Yarn client.
This issue happens in both HA and non-HA of RM. HDFS has simialr issues. [https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/48]
I propose to add a new RMFailoverProxyProvider called AutoRefreshRMFailoverProxyProvider which will resove the DNS in the overwriten function getProxy(). This way, RetryProxy can resolve the DNS each time it retry.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org