You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2022/01/03 07:41:00 UTC
[jira] [Resolved] (TEZ-4349) DAGClient gets stuck with invalid cached DAGStatus
[ https://issues.apache.org/jira/browse/TEZ-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor resolved TEZ-4349.
-------------------------------
Resolution: Fixed
> DAGClient gets stuck with invalid cached DAGStatus
> --------------------------------------------------
>
> Key: TEZ-4349
> URL: https://issues.apache.org/jira/browse/TEZ-4349
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Ahmed Hussein
> Assignee: Ahmed Hussein
> Priority: Major
> Fix For: 0.10.2
>
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> I found that some Oozie launchers get stuck waiting for the job to complete.
> After investigation I found that {{dagClient.getDAGStatus(null)}} calls the override {{dagClient.getDAGStatus(null, 0)}} , which then calls {{getDAGStatusInternal}} making use of the cachedDagStatus field.
> The cachedDagStatus is never updated causing the launcher to wait indefinitely.
> [https://github.com/apache/tez/blob/master/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java#L212]
> {code:java}
> if (!dagCompleted) {
> if (dagStatus != null) {
> cachedDagStatus = dagStatus;
> return dagStatus;
> }
> if (cachedDagStatus != null) {
> // could not get from AM (not reachable/ was killed). return cached status.
> return cachedDagStatus;
> }
> }
> {code}
> +To Fix:+
> The {{cachedDagStatus}} should be valid for a certain amount of time, or certain number of retires.
> When the cachedDAGStatus expires, the DAGClient tries to pull from AM or the RM.
> An error in fetching the status from both AM and RM, would return null to the caller.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)