You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Attila Magyar (Jira)" <ji...@apache.org> on 2020/05/14 16:03:00 UTC

[jira] [Created] (TEZ-4181) [Kubernetes] Use hostname + pod UID for shuffle manager caching

Attila Magyar created TEZ-4181:
----------------------------------

             Summary: [Kubernetes] Use hostname + pod UID for shuffle manager caching
                 Key: TEZ-4181
                 URL: https://issues.apache.org/jira/browse/TEZ-4181
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Attila Magyar
            Assignee: Attila Magyar


When a pod restarts, it uses the same hostname and shuffle port. Now when fetcher threads connects to download the shuffle data it will use the cached connection info and since the pod has died it's shuffle data will also get cleaned up. When the pod restarts, it receives connection from clients to download specific shuffle data but the daemon will not have it because of the restart.

In ShuffleManager.java's knownSrcHosts the key should be updated to HostInfo which is a combination of host+port and the host's unique ID. The host host Id changes when a node is killed or restarted.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)