You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by "Ahmed Hussein (Jira)" <ji...@apache.org> on 2021/11/13 15:20:00 UTC
[jira] [Created] (TEZ-4349) DAGClient gets stuck with invalid cached DAGStatus
Ahmed Hussein created TEZ-4349:
----------------------------------
Summary: DAGClient gets stuck with invalid cached DAGStatus
Key: TEZ-4349
URL: https://issues.apache.org/jira/browse/TEZ-4349
Project: Apache Tez
Issue Type: Bug
Reporter: Ahmed Hussein
Assignee: Ahmed Hussein
I found that some Oozie launchers get stuck waiting for the job to complete.
After investigation I found that {{dagClient.getDAGStatus(null)}} calls the override {{dagClient.getDAGStatus(null, 0)}} , which then calls {{getDAGStatusInternal}} making use of the cachedDagStatus field.
The cachedDagStatus is never updated causing the launcher to wait indefinitely.
[https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java#L212]
{code:java}
if (!dagCompleted) {
if (dagStatus != null) {
cachedDagStatus = dagStatus;
return dagStatus;
}
if (cachedDagStatus != null) {
// could not get from AM (not reachable/ was killed). return cached status.
return cachedDagStatus;
}
}
{code}
+To Fix:+
The {{cachedDagStatus}} should be valid for a certain amount of time, or certain number of retires.
When the cachedDAGStatus expires, the DAGClient tries to pull from AM or the RM.
An error in fetching the status from both AM and RM, would return null to the caller.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)