You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@myriad.apache.org by "DarinJ (JIRA)" <ji...@apache.org> on 2015/10/21 21:29:28 UTC
[jira] [Commented] (MYRIAD-155) Relaunched NM on same node caused NullPointerException while yarn containers were running previously.
[ https://issues.apache.org/jira/browse/MYRIAD-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967719#comment-14967719 ]
DarinJ commented on MYRIAD-155:
-------------------------------
Myriad-160
> Relaunched NM on same node caused NullPointerException while yarn containers were running previously.
> -----------------------------------------------------------------------------------------------------
>
> Key: MYRIAD-155
> URL: https://issues.apache.org/jira/browse/MYRIAD-155
> Project: Myriad
> Issue Type: Bug
> Reporter: Sarjeet Singh
>
> This seems a yarn issue (YARN-2441) when the NM is re-launched on the same node where previously the containers were active/running.
> 15/10/15 10:43:18 INFO ipc.Server: Socket Reader #1 for port 31000:
> readAndProcess from client 10.10.101.113 threw exception
> [java.lang.NullPointerException]
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
> at
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43)
> at
> org.apache.hadoop.security.rpcauth.DigestAuthMethod$SaslDigestCallbackHandler.getPassword(DigestAuthMethod.java:212)
> at
> org.apache.hadoop.security.rpcauth.DigestAuthMethod$SaslDigestCallbackHandler.handle(DigestAuthMethod.java:238)
> at
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
> at
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
> at
> org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1393)
> at
> org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1370)
> at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1283)
> at
> org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1246)
> at
> org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1896)
> at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1764)
> at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1528)
> at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:774)
> at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:640)
> at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:611)
> 15/10/15 10:43:22 INFO security.NMContainerTokenSecretManager: Updating node
> address : qa101-116.qa.lab:31000
> The issue is that the "AM tries to connect to NM before NM finished registering with RM".
> Myriad can solve this by picking ports randomly from the list of
> random ports it receives from Mesos to differentiate between the NMs from RM's view.
> We can randomly select the NM ports, instead selecting the first few ports as implemented here:
> https://github.com/apache/incubator-myriad/blob/master/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMPorts.java#L46
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)