You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Rohith (JIRA)" <ji...@apache.org> on 2014/05/02 13:01:19 UTC

[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

    [ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987590#comment-13987590 ] 

Rohith commented on YARN-2010:
------------------------------

For completed applications before starting in secured mode, clientTokenMaterKey is null. After starting in secured mode, recovery of apps fails since clientTokenMasterKey is null. During recovering application, rm should have intellegence to decide whether recovering applicaiton has run in secured mode or non secured mode. This is possible by checking cilentTokenMasterKey for null. 

> RM can't transition to active if it can't recover an app attempt
> ----------------------------------------------------------------
>
>                 Key: YARN-2010
>                 URL: https://issues.apache.org/jira/browse/YARN-2010
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: bc Wong
>
> If the RM fails to recover an app attempt, it won't come up. We should make it more resilient.
> Specifically, the underlying error is that the app was submitted before Kerberos security got turned on. Makes sense for the app to fail in this case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election 
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active 
> at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118) 
> at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804) 
> at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) 
> at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode 
> at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274) 
> at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116) 
> ... 4 more 
> Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException: java.lang.IllegalArgumentException: Missing argument 
> at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) 
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811) 
> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842) 
> at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265) 
> ... 5 more 
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: java.lang.IllegalArgumentException: Missing argument 
> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372) 
> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273) 
> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406) 
> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000) 
> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462) 
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> ... 8 more 
> Caused by: java.lang.IllegalArgumentException: Missing argument 
> at javax.crypto.spec.SecretKeySpec.<init>(SecretKeySpec.java:93) 
> at org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188) 
> at org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInRM.registerMasterKey(ClientToAMTokenSecretManagerInRM.java:49) 
> at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recoverAppAttemptCredentials(RMAppAttemptImpl.java:711) 
> at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:689) 
> at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:663) 
> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:369) 
> ... 13 more 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)