You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by "Ahmed Hussein (Jira)" <ji...@apache.org> on 2021/11/13 15:20:00 UTC

[jira] [Created] (TEZ-4349) DAGClient gets stuck with invalid cached DAGStatus

Ahmed Hussein created TEZ-4349:
----------------------------------

             Summary: DAGClient gets stuck with invalid cached DAGStatus
                 Key: TEZ-4349
                 URL: https://issues.apache.org/jira/browse/TEZ-4349
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Ahmed Hussein
            Assignee: Ahmed Hussein


I found that some Oozie launchers get stuck waiting for the job to complete.
After investigation I found that {{dagClient.getDAGStatus(null)}} calls the override {{dagClient.getDAGStatus(null, 0)}} , which then calls {{getDAGStatusInternal}} making use of the cachedDagStatus field.

The cachedDagStatus is never updated causing the launcher to wait indefinitely.
 [https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java#L212]
{code:java}
      if (!dagCompleted) {
        if (dagStatus != null) {
          cachedDagStatus = dagStatus;
          return dagStatus;
        }
        if (cachedDagStatus != null) {
          // could not get from AM (not reachable/ was killed). return cached status.
          return cachedDagStatus;
        }
      }
{code}
+To Fix:+
 The {{cachedDagStatus}} should be valid for a certain amount of time, or certain number of retires.

When the cachedDAGStatus expires, the DAGClient tries to pull from AM or the RM.
An error in fetching the status from both AM and RM, would return null to the caller.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)