You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/31 19:07:00 UTC

[jira] [Updated] (YARN-6583) Hadoop-sls failed to start because of premature state of RM

     [ https://issues.apache.org/jira/browse/YARN-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated YARN-6583:
---------------------------------
    Labels: easyfix pull-request-available  (was: easyfix)

> Hadoop-sls failed to start because of premature state of RM
> -----------------------------------------------------------
>
>                 Key: YARN-6583
>                 URL: https://issues.apache.org/jira/browse/YARN-6583
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler-load-simulator
>    Affects Versions: 2.6.0
>            Reporter: JayceAu
>            Priority: Major
>              Labels: easyfix, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> During startup of SLS, after startRM() in SLSRunner.start(), BaseContainerTokenSecretManager not yet generate its onw internal key or it's not yet exposed to the other thread, then NM registration will fail because of the following exception. Finally, the whole SLS process will crash.
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:81)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:300)
>         at org.apache.hadoop.yarn.sls.nodemanager.NMSimulator.init(NMSimulator.java:105)
>         at org.apache.hadoop.yarn.sls.SLSRunner.startNM(SLSRunner.java:202)
>         at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:143)
>         at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
> 17/05/11 10:21:06 INFO resourcemanager.ResourceManager: Recovery started
> 17/05/11 10:21:06 INFO recovery.ZKRMStateStore: Watcher event type: None with state:SyncConnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org