You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2016/09/01 22:50:20 UTC

[jira] [Comment Edited] (TEZ-3426) Second AM attempt launched for session mode and recovery disabled for certain cases

    [ https://issues.apache.org/jira/browse/TEZ-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456857#comment-15456857 ] 

Hitesh Shah edited comment on TEZ-3426 at 9/1/16 10:49 PM:
-----------------------------------------------------------

+1. Is a unit test possible for this check? 

Not sure if the error message can be made a bit more clear for non-Tez devs.

Maybe something like: "Initial application attempt in session mode failed. Application cannot recover and continue properly as DAG recovery has been disabled" ?

A bit more verbose I guess. Am okay with any version of the error message going in. 


was (Author: hitesh):
+1. 

Not sure if the error message can be made a bit more clear for non-Tez devs.

Maybe something like: "Initial application attempt in session mode failed. Application cannot recover and continue properly as DAG recovery has been disabled" ?

A bit more verbose I guess. Am okay with any version of the error message going in. 

> Second AM attempt launched for session mode and recovery disabled for certain cases
> -----------------------------------------------------------------------------------
>
>                 Key: TEZ-3426
>                 URL: https://issues.apache.org/jira/browse/TEZ-3426
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: TEZ-3426.001.patch, TEZ-3426.002.patch
>
>
> ApplicationSubmissionContext#setMaxAppAttempts does not fully guarantee that there will be only that many attempts at maximum. There are a few exceptional cases that are not count. Tez should protect itself from accidentally starting the second attempt in session mode and when recovery is disabled since the second attempt will always succeed with no work to do.
> {code}
>   @Override
>   public boolean shouldCountTowardsMaxAttemptRetry() {
>     try {
>       this.readLock.lock();
>       int exitStatus = getAMContainerExitStatus();
>       return !(exitStatus == ContainerExitStatus.PREEMPTED
>           || exitStatus == ContainerExitStatus.ABORTED
>           || exitStatus == ContainerExitStatus.DISKS_FAILED
>           || exitStatus == ContainerExitStatus.KILLED_BY_RESOURCEMANAGER);
>     } finally {
>       this.readLock.unlock();
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)