You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2014/03/13 21:51:32 UTC

ResourceManager shutting down

We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
Thanks,
John


2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
        at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
        at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
        ... 15 more
2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088
2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
... and so on, it shuts down


Re: ResourceManager shutting down

Posted by Hitesh Shah <hi...@apache.org>.
Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
> … and so on, it shuts down
>  


Re: ResourceManager shutting down

Posted by Jian He <jh...@hortonworks.com>.
Which Hadoop version are you running ? this should be recently fixed.

Jian


On Thu, Mar 13, 2014 at 8:33 PM, Hitesh Shah <hi...@apache.org> wrote:

> Hi John
>
> Would you mind filing a jira with more details. The RM going down just
> because a host was not resolvable or DNS timed out is something that should
> be addressed.
>
> thanks
> -- Hitesh
>
> On Mar 13, 2014, at 2:29 PM, John Lilley wrote:
>
> > Never mind... we figured out its DNS entry was going missing.
> > john
> >
> > From: John Lilley [mailto:john.lilley@redpoint.net]
> > Sent: Thursday, March 13, 2014 2:52 PM
> > To: user@hadoop.apache.org
> > Subject: ResourceManager shutting down
> >
> > We have this erratic behavior where every so often the RM will shutdown
> with an UnknownHostException.  The odd thing is, the host it complains
> about have been in use for days at that point without problem.  Any ideas?
> > Thanks,
> > John
> >
> >
> > 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change
> from ACCEPTED to RUNNING
> > 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager
> (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE
> to the scheduler
> > java.lang.IllegalArgumentException: java.net.UnknownHostException:
> skitzo.office.datalever.com
> >         at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
> >         at
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> >         at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
> >         ... 15 more
> > 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager
> (ResourceManager.java:run(453)) - Exiting, bbye..
> > 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) -
> Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> > 2014-03-13 14:38:16,013 ERROR
> delegation.AbstractDelegationTokenSecretManager
> (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException:
> sleep interrupted
> > 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics
> system...
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system
> shutdown complete.
> > 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher
> (ApplicationMasterLauncher.java:run(98)) -
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> > 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) -
> Stopping server on 8141
> > 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) -
> Stopping server on 8050
> > ... and so on, it shuts down
> >
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: ResourceManager shutting down

Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi Hitesh,

          Yes it is an issue. This is handled in https://issues.apache.org/jira/i#browse/YARN-713 fixes DNS Issue. This fix available on hadoop-2.4(unreleased).


Thanks & Regards
Rohith Sharma K S

-----Original Message-----
From: Hitesh Shah [mailto:hitesh@apache.org] 
Sent: 14 March 2014 09:03
To: user@hadoop.apache.org
Subject: Re: ResourceManager shutting down

Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind... we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl 
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State 
> change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(449)) - Error in handling event type 
> NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - 
> Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR 
> delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(557)) - 
> InterruptedExcpetion recieved for ExpiredTokenRemover thread 
> java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8050 ... and so on, it shuts down
>  


Re: ResourceManager shutting down

Posted by Jian He <jh...@hortonworks.com>.
Which Hadoop version are you running ? this should be recently fixed.

Jian


On Thu, Mar 13, 2014 at 8:33 PM, Hitesh Shah <hi...@apache.org> wrote:

> Hi John
>
> Would you mind filing a jira with more details. The RM going down just
> because a host was not resolvable or DNS timed out is something that should
> be addressed.
>
> thanks
> -- Hitesh
>
> On Mar 13, 2014, at 2:29 PM, John Lilley wrote:
>
> > Never mind... we figured out its DNS entry was going missing.
> > john
> >
> > From: John Lilley [mailto:john.lilley@redpoint.net]
> > Sent: Thursday, March 13, 2014 2:52 PM
> > To: user@hadoop.apache.org
> > Subject: ResourceManager shutting down
> >
> > We have this erratic behavior where every so often the RM will shutdown
> with an UnknownHostException.  The odd thing is, the host it complains
> about have been in use for days at that point without problem.  Any ideas?
> > Thanks,
> > John
> >
> >
> > 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change
> from ACCEPTED to RUNNING
> > 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager
> (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE
> to the scheduler
> > java.lang.IllegalArgumentException: java.net.UnknownHostException:
> skitzo.office.datalever.com
> >         at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
> >         at
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> >         at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
> >         ... 15 more
> > 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager
> (ResourceManager.java:run(453)) - Exiting, bbye..
> > 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) -
> Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> > 2014-03-13 14:38:16,013 ERROR
> delegation.AbstractDelegationTokenSecretManager
> (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException:
> sleep interrupted
> > 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics
> system...
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system
> shutdown complete.
> > 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher
> (ApplicationMasterLauncher.java:run(98)) -
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> > 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) -
> Stopping server on 8141
> > 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) -
> Stopping server on 8050
> > ... and so on, it shuts down
> >
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: ResourceManager shutting down

Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi Hitesh,

          Yes it is an issue. This is handled in https://issues.apache.org/jira/i#browse/YARN-713 fixes DNS Issue. This fix available on hadoop-2.4(unreleased).


Thanks & Regards
Rohith Sharma K S

-----Original Message-----
From: Hitesh Shah [mailto:hitesh@apache.org] 
Sent: 14 March 2014 09:03
To: user@hadoop.apache.org
Subject: Re: ResourceManager shutting down

Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind... we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl 
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State 
> change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(449)) - Error in handling event type 
> NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - 
> Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR 
> delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(557)) - 
> InterruptedExcpetion recieved for ExpiredTokenRemover thread 
> java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8050 ... and so on, it shuts down
>  


Re: ResourceManager shutting down

Posted by Jian He <jh...@hortonworks.com>.
Which Hadoop version are you running ? this should be recently fixed.

Jian


On Thu, Mar 13, 2014 at 8:33 PM, Hitesh Shah <hi...@apache.org> wrote:

> Hi John
>
> Would you mind filing a jira with more details. The RM going down just
> because a host was not resolvable or DNS timed out is something that should
> be addressed.
>
> thanks
> -- Hitesh
>
> On Mar 13, 2014, at 2:29 PM, John Lilley wrote:
>
> > Never mind... we figured out its DNS entry was going missing.
> > john
> >
> > From: John Lilley [mailto:john.lilley@redpoint.net]
> > Sent: Thursday, March 13, 2014 2:52 PM
> > To: user@hadoop.apache.org
> > Subject: ResourceManager shutting down
> >
> > We have this erratic behavior where every so often the RM will shutdown
> with an UnknownHostException.  The odd thing is, the host it complains
> about have been in use for days at that point without problem.  Any ideas?
> > Thanks,
> > John
> >
> >
> > 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change
> from ACCEPTED to RUNNING
> > 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager
> (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE
> to the scheduler
> > java.lang.IllegalArgumentException: java.net.UnknownHostException:
> skitzo.office.datalever.com
> >         at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
> >         at
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> >         at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
> >         ... 15 more
> > 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager
> (ResourceManager.java:run(453)) - Exiting, bbye..
> > 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) -
> Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> > 2014-03-13 14:38:16,013 ERROR
> delegation.AbstractDelegationTokenSecretManager
> (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException:
> sleep interrupted
> > 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics
> system...
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system
> shutdown complete.
> > 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher
> (ApplicationMasterLauncher.java:run(98)) -
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> > 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) -
> Stopping server on 8141
> > 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) -
> Stopping server on 8050
> > ... and so on, it shuts down
> >
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: ResourceManager shutting down

Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi Hitesh,

          Yes it is an issue. This is handled in https://issues.apache.org/jira/i#browse/YARN-713 fixes DNS Issue. This fix available on hadoop-2.4(unreleased).


Thanks & Regards
Rohith Sharma K S

-----Original Message-----
From: Hitesh Shah [mailto:hitesh@apache.org] 
Sent: 14 March 2014 09:03
To: user@hadoop.apache.org
Subject: Re: ResourceManager shutting down

Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind... we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl 
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State 
> change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(449)) - Error in handling event type 
> NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - 
> Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR 
> delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(557)) - 
> InterruptedExcpetion recieved for ExpiredTokenRemover thread 
> java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8050 ... and so on, it shuts down
>  


RE: ResourceManager shutting down

Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi Hitesh,

          Yes it is an issue. This is handled in https://issues.apache.org/jira/i#browse/YARN-713 fixes DNS Issue. This fix available on hadoop-2.4(unreleased).


Thanks & Regards
Rohith Sharma K S

-----Original Message-----
From: Hitesh Shah [mailto:hitesh@apache.org] 
Sent: 14 March 2014 09:03
To: user@hadoop.apache.org
Subject: Re: ResourceManager shutting down

Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind... we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl 
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State 
> change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(449)) - Error in handling event type 
> NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - 
> Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR 
> delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(557)) - 
> InterruptedExcpetion recieved for ExpiredTokenRemover thread 
> java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8050 ... and so on, it shuts down
>  


Re: ResourceManager shutting down

Posted by Jian He <jh...@hortonworks.com>.
Which Hadoop version are you running ? this should be recently fixed.

Jian


On Thu, Mar 13, 2014 at 8:33 PM, Hitesh Shah <hi...@apache.org> wrote:

> Hi John
>
> Would you mind filing a jira with more details. The RM going down just
> because a host was not resolvable or DNS timed out is something that should
> be addressed.
>
> thanks
> -- Hitesh
>
> On Mar 13, 2014, at 2:29 PM, John Lilley wrote:
>
> > Never mind... we figured out its DNS entry was going missing.
> > john
> >
> > From: John Lilley [mailto:john.lilley@redpoint.net]
> > Sent: Thursday, March 13, 2014 2:52 PM
> > To: user@hadoop.apache.org
> > Subject: ResourceManager shutting down
> >
> > We have this erratic behavior where every so often the RM will shutdown
> with an UnknownHostException.  The odd thing is, the host it complains
> about have been in use for days at that point without problem.  Any ideas?
> > Thanks,
> > John
> >
> >
> > 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change
> from ACCEPTED to RUNNING
> > 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager
> (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE
> to the scheduler
> > java.lang.IllegalArgumentException: java.net.UnknownHostException:
> skitzo.office.datalever.com
> >         at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
> >         at
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> >         at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
> >         ... 15 more
> > 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager
> (ResourceManager.java:run(453)) - Exiting, bbye..
> > 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) -
> Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> > 2014-03-13 14:38:16,013 ERROR
> delegation.AbstractDelegationTokenSecretManager
> (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException:
> sleep interrupted
> > 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics
> system...
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system
> shutdown complete.
> > 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher
> (ApplicationMasterLauncher.java:run(98)) -
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> > 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) -
> Stopping server on 8141
> > 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) -
> Stopping server on 8050
> > ... and so on, it shuts down
> >
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: ResourceManager shutting down

Posted by Hitesh Shah <hi...@apache.org>.
Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
> … and so on, it shuts down
>  


Re: ResourceManager shutting down

Posted by Hitesh Shah <hi...@apache.org>.
Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
> … and so on, it shuts down
>  


Re: ResourceManager shutting down

Posted by Hitesh Shah <hi...@apache.org>.
Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
> … and so on, it shuts down
>  


Re: ResourceManager shutting down

Posted by Hitesh Shah <hi...@apache.org>.
Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
> … and so on, it shuts down
>  


Re: ResourceManager shutting down

Posted by Hitesh Shah <hi...@apache.org>.
Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
> … and so on, it shuts down
>  


Re: ResourceManager shutting down

Posted by Hitesh Shah <hi...@apache.org>.
Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
> … and so on, it shuts down
>  


Re: ResourceManager shutting down

Posted by Hitesh Shah <hi...@apache.org>.
Hi John

Would you mind filing a jira with more details. The RM going down just because a host was not resolvable or DNS timed out is something that should be addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lilley@redpoint.net] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
> … and so on, it shuts down
>  


RE: ResourceManager shutting down

Posted by John Lilley <jo...@redpoint.net>.
Never mind... we figured out its DNS entry was going missing.
john

From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, March 13, 2014 2:52 PM
To: user@hadoop.apache.org
Subject: ResourceManager shutting down

We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
Thanks,
John


2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
        at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
        at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
        ... 15 more
2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088<mailto:SelectChannelConnector@metallica.office.datalever.com:8088>
2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
... and so on, it shuts down


RE: ResourceManager shutting down

Posted by John Lilley <jo...@redpoint.net>.
Never mind... we figured out its DNS entry was going missing.
john

From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, March 13, 2014 2:52 PM
To: user@hadoop.apache.org
Subject: ResourceManager shutting down

We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
Thanks,
John


2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
        at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
        at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
        ... 15 more
2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088<mailto:SelectChannelConnector@metallica.office.datalever.com:8088>
2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
... and so on, it shuts down


RE: ResourceManager shutting down

Posted by John Lilley <jo...@redpoint.net>.
Never mind... we figured out its DNS entry was going missing.
john

From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, March 13, 2014 2:52 PM
To: user@hadoop.apache.org
Subject: ResourceManager shutting down

We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
Thanks,
John


2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
        at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
        at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
        ... 15 more
2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088<mailto:SelectChannelConnector@metallica.office.datalever.com:8088>
2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
... and so on, it shuts down


RE: ResourceManager shutting down

Posted by John Lilley <jo...@redpoint.net>.
Never mind... we figured out its DNS entry was going missing.
john

From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, March 13, 2014 2:52 PM
To: user@hadoop.apache.org
Subject: ResourceManager shutting down

We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException.  The odd thing is, the host it complains about have been in use for days at that point without problem.  Any ideas?
Thanks,
John


2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change from ACCEPTED to RUNNING
2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: skitzo.office.datalever.com
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
        at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
        at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
        ... 15 more
2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager (ResourceManager.java:run(453)) - Exiting, bbye..
2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@metallica.office.datalever.com:8088<mailto:SelectChannelConnector@metallica.office.datalever.com:8088>
2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted
2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete.
2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8141
2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping server on 8050
... and so on, it shuts down