You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Prabhu Joseph (Jira)" <ji...@apache.org> on 2021/07/23 09:25:00 UTC

[jira] [Created] (YARN-10871) Aborted AM is considered as App Failure when user sets MaxAttempts as 1

Prabhu Joseph created YARN-10871:
------------------------------------

             Summary: Aborted AM is considered as App Failure when user sets MaxAttempts as 1
                 Key: YARN-10871
                 URL: https://issues.apache.org/jira/browse/YARN-10871
             Project: Hadoop YARN
          Issue Type: Bug
          Components: RM
    Affects Versions: 3.3.1
            Reporter: Prabhu Joseph
            Assignee: Prabhu Joseph


When an AM Container is ABORTED due to Node Decommission, the AppAttempt failure is not counted. But if user sets number of attempts as 1, then YARN considers the ABORTED AM as a failure. 

{code}
      int numberOfFailure = app.getNumFailedAppAttempts();
      if (app.maxAppAttempts == 1) {
        // If the user explicitly set the attempts to 1 then there are likely
        // correctness issues if the AM restarts for any reason.
        LOG.info("Max app attempts is 1 for " + app.applicationId
            + ", preventing further attempts.");
        numberOfFailure = app.maxAppAttempts;
      } 
{code}

Livy sets the number of attempts as 1 since it's Rpc Server does not yet support multiple connections for the same registered app. But in our case AM is ABORTED before even the AM starts (AM was in ACAUIRED state)

Usually users won't decommission the node where the Container is in RUNNING state (where the session is established). But the decommission can happen on nodes where the container is in ACQUIRED or ALLOCATED state. 

Will suggest to expose an config where user can decide whether to consider this as a failure or not. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org