You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Jian He (JIRA)" <ji...@apache.org> on 2014/03/30 01:18:17 UTC

[jira] [Resolved] (YARN-1894) RM shutdown due to java.net.UnknownHostException

     [ https://issues.apache.org/jira/browse/YARN-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jian He resolved YARN-1894.
---------------------------

       Resolution: Fixed
    Fix Version/s: 2.4.0

Thanks for reporting.  

The fix will be included in 2.4 release. Closed this 

> RM shutdown due to java.net.UnknownHostException
> ------------------------------------------------
>
>                 Key: YARN-1894
>                 URL: https://issues.apache.org/jira/browse/YARN-1894
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Mohammad Kamrul Islam
>            Assignee: Mohammad Kamrul Islam
>             Fix For: 2.4.0
>
>
> Background:
> ----------------
> I started Hadoop 2.3 on my Mac in my office network and submitted few jobs successfully. When i went to my home (new network), I submitted another job and it abruptly pulled down the RM service.
> Error in RM log:
> {noformat}
> 2014-03-29 12:28:56,754 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing RMDelegation token with sequence number: 3
> 2014-03-29 12:28:57,256 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: mislam-mn.<MY.OOFICE.DOMAIN>
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1294)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1342)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1208)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1167)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:868)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:642)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:556)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:696)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:740)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:88)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:543)
>         at java.lang.Thread.run(Thread.java:695)
> Caused by: java.net.UnknownHostException: mislam-mn.linkedin.biz
>         ... 15 more
> 2014-03-29 12:28:57,259 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> 2014-03-29 12:28:57,297 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:8088
> 2014-03-29 12:28:57,401 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032
> 2014-03-29 12:28:57,473 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033
> .....
> {noformat}
> Proposal:
> ---------------
> I believe the root cause : I moved my machine from one network to another with the same RM service.
> My point is: Whatever the cause, RM is a long running core-service and it should not exit this way. An appropriate error message should be sufficient.
> If there is an consensus (or no disagreement), I can work for a patch.
>   



--
This message was sent by Atlassian JIRA
(v6.2#6252)