You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Yong Xing (Jira)" <ji...@apache.org> on 2019/12/19 10:29:00 UTC

[jira] [Created] (YARN-10046) RM failed to transition to Active because of App recovery throwing java.lang.NullPointerException

Yong Xing created YARN-10046:
--------------------------------

             Summary: RM failed to transition to Active because of App recovery throwing java.lang.NullPointerException
                 Key: YARN-10046
                 URL: https://issues.apache.org/jira/browse/YARN-10046
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 3.0.0
            Reporter: Yong Xing


 

CDH Distribution: Hadoop 3.0.0-cdh6.0.1

The exception stack is as follows.

2019-12-12 17:09:41,422 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to load/recover state
java.lang.NullPointerException
 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:521)
 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1221)
 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:130)
 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1265)
 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1206)
 at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
 at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
 at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:907)
 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:116)
 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:1046)
 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$2000(RMAppImpl.java:118)
 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1110)
 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1051)
 at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
 at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
 at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:875)
 at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:357)
 at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:544)
 at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1393)
 at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:758)
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
 at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1146)
 at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1186)
 at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1182)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1726)
 at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1182)
 at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
 at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
 at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
 at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
 at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:592)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)

 

During the recovery of Application attempts, the status of one app attempt is NULL. The following LOG  describes:

2019-12-12 17:09:41,381 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: application_1576136386231_0742 with 1 attempts and final state = NONE2019-12-12 17:09:41,381 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: application_1576136386231_0742 with 1 attempts and final state = NONE

 

The corresponding code in rmapp/RMAppImpl.java is

if (recoveredFinalState == null) {
 LOG.info(String.format(RECOVERY_MESSAGE, getApplicationId(),
 appState.getAttemptCount(), "NONE"));
 } else if (LOG.isDebugEnabled()) {
 LOG.debug(String.format(RECOVERY_MESSAGE, getApplicationId(),
 appState.getAttemptCount(), recoveredFinalState));
 }

 

In rmapp/attempt/RMAppAttemptImpl.java, there is a piece of code using the RMAppAttemptState, which is NULL.

 

private static class BaseFinalTransition extends BaseTransition {

private final RMAppAttemptState finalAttemptState;

public BaseFinalTransition(RMAppAttemptState finalAttemptState) {
 this.finalAttemptState = finalAttemptState;
 }

@Override
 public void transition(RMAppAttemptImpl appAttempt,
 RMAppAttemptEvent event) {
 ApplicationAttemptId appAttemptId = appAttempt.getAppAttemptId();

// Tell the AMS. Unregister from the ApplicationMasterService
 appAttempt.masterService.unregisterAttempt(appAttemptId);

// Tell the application and the scheduler
 ApplicationId applicationId = appAttemptId.getApplicationId();
 RMAppEvent appEvent = null;
 boolean keepContainersAcrossAppAttempts = false;
 switch (finalAttemptState) {
 case FINISHED:
 {

In the switch clause,  java.lang.NullPointerException is thrown because finalAttemptState is NULL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org