You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Leon zhang (JIRA)" <ji...@apache.org> on 2019/03/18 23:40:00 UTC

[jira] [Created] (HDFS-14376) Yarn Client may use stale DNS to connect to RM

Leon zhang created HDFS-14376:
---------------------------------

             Summary: Yarn Client may use stale DNS to connect to RM
                 Key: HDFS-14376
                 URL: https://issues.apache.org/jira/browse/HDFS-14376
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: caching
    Affects Versions: 2.9.1
            Reporter: Leon zhang


This happens more frequently when running yarn in Kubernetes. When yarn client try to connect to RM, if the DNS of RM is not resovable due to kube-dns failure or not ready, the yarn client will initaize itself with unresoved InetSocketAddress in RMProxy#newProxyInstance(). The connect to RM will fail with UnknownHostException. Yarn client will retry the connection by RetryProxy by it always use the cached unresolved InetSocketAddress. The retry will never success. When RM is reschdured to another kubernetes node, which changed the RM ip, this bug will also happen. Currently the work around is to restarting the Yarn client. 

This issue happens in both HA and non-HA of RM. HDFS has simialr issues. [https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/48]

I propose to add a new RMFailoverProxyProvider called AutoRefreshRMFailoverProxyProvider which will resove the DNS in the overwriten function getProxy(). This way, RetryProxy can resolve the DNS each time it retry. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org