You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Íñigo Goiri (JIRA)" <ji...@apache.org> on 2019/03/19 02:20:00 UTC

[jira] [Moved] (YARN-9399) Yarn Client may use stale DNS to connect to RM

     [ https://issues.apache.org/jira/browse/YARN-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Íñigo Goiri moved HDFS-14376 to YARN-9399:
------------------------------------------

    Affects Version/s:     (was: 2.9.1)
                       2.9.1
     Target Version/s:   (was: 3.1.0, 2.9.1)
          Component/s:     (was: caching)
                  Key: YARN-9399  (was: HDFS-14376)
              Project: Hadoop YARN  (was: Hadoop HDFS)

> Yarn Client may use stale DNS to connect to RM
> ----------------------------------------------
>
>                 Key: YARN-9399
>                 URL: https://issues.apache.org/jira/browse/YARN-9399
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.9.1
>            Reporter: Leon zhang
>            Priority: Major
>              Labels: patch
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This happens more frequently when running yarn in Kubernetes. When yarn client try to connect to RM, if the DNS of RM is not resovable due to kube-dns failure or not ready, the yarn client will initaize itself with unresoved InetSocketAddress in RMProxy#newProxyInstance(). The connect to RM will fail with UnknownHostException. Yarn client will retry the connection by RetryProxy by it always use the cached unresolved InetSocketAddress. The retry will never success. When RM is reschdured to another kubernetes node, which changed the RM ip, this bug will also happen. Currently the work around is to restarting the Yarn client. 
> This issue happens in both HA and non-HA of RM. HDFS has simialr issues. [https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/48]
> I propose to add a new RMFailoverProxyProvider called AutoRefreshRMFailoverProxyProvider which will resove the DNS in the overwriten function getProxy(). This way, RetryProxy can resolve the DNS each time it retry. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org