You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@myriad.apache.org by "DarinJ (JIRA)" <ji...@apache.org> on 2015/10/21 21:29:28 UTC

[jira] [Commented] (MYRIAD-155) Relaunched NM on same node caused NullPointerException while yarn containers were running previously.

    [ https://issues.apache.org/jira/browse/MYRIAD-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967719#comment-14967719 ] 

DarinJ commented on MYRIAD-155:
-------------------------------

Myriad-160

> Relaunched NM on same node caused NullPointerException while yarn containers were running previously.
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MYRIAD-155
>                 URL: https://issues.apache.org/jira/browse/MYRIAD-155
>             Project: Myriad
>          Issue Type: Bug
>            Reporter: Sarjeet Singh
>
> This seems a yarn issue (YARN-2441) when the NM is re-launched on the same node where previously the containers were active/running.
> 15/10/15 10:43:18 INFO ipc.Server: Socket Reader #1 for port 31000:
> readAndProcess from client 10.10.101.113 threw exception
> [java.lang.NullPointerException]
> java.lang.NullPointerException
>     at
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167)
>     at
> org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43)
>     at
> org.apache.hadoop.security.rpcauth.DigestAuthMethod$SaslDigestCallbackHandler.getPassword(DigestAuthMethod.java:212)
>     at
> org.apache.hadoop.security.rpcauth.DigestAuthMethod$SaslDigestCallbackHandler.handle(DigestAuthMethod.java:238)
>     at
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
>     at
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>     at
> org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1393)
>     at
> org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1370)
>     at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1283)
>     at
> org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1246)
>     at
> org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1896)
>     at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1764)
>     at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1528)
>     at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:774)
>     at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:640)
>     at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:611)
> 15/10/15 10:43:22 INFO security.NMContainerTokenSecretManager: Updating node
> address : qa101-116.qa.lab:31000
> The issue is that the "AM tries to connect to NM before NM finished registering with RM".
> Myriad can solve this by picking ports randomly from the list of
> random ports it receives from Mesos to differentiate between the NMs from RM's view.
> We can randomly select the NM ports, instead selecting the first few ports as implemented here: 
> https://github.com/apache/incubator-myriad/blob/master/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMPorts.java#L46



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)