You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Heng Chen (JIRA)" <ji...@apache.org> on 2016/08/12 08:05:21 UTC

[jira] [Commented] (YARN-3001) RM dies because of divide by zero

    [ https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418495#comment-15418495 ] 

Heng Chen commented on YARN-3001:
---------------------------------

We encounter this issue too,  our cluster is 2.5.0,  the resource manager log shows:
{code}
2016-08-12 02:06:51,204 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8644_01_000820 Container Transitioned from NEW to RESERVED
2016-08-12 02:06:51,204 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Reserved container  application attempt=appattempt_1470798127749_8644_000001 resource=<memory:2048, vCores:1> queue=default: capacity=0.65, absoluteCapacity=0.65, usedResources=<memory:3719168, vCores:1715>, usedCapacity=1.1451658, absoluteUsedCapacity=0.74435765, numApps=19, numContainers=1715 node=host: dx-pipe-sata114-pm:38694 #containers=36 available=19456 used=82944 clusterResource=<memory:6553600, vCores:2304>
2016-08-12 02:06:51,204 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting assigned queue: root.default stats: default: capacity=0.65, absoluteCapacity=0.65, usedResources=<memory:3721216, vCores:1716>, usedCapacity=1.1458336, absoluteUsedCapacity=0.7447917, numApps=19, numContainers=1716
2016-08-12 02:06:51,204 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.86067706 absoluteUsedCapacity=0.86067706 used=<memory:4965376, vCores:1983> cluster=<memory:6553600, vCores:2304>
2016-08-12 02:06:51,222 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000041 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,222 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000042 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,222 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000043 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,222 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000044 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,222 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000045 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,222 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000046 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,222 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000047 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,223 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000048 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,223 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000049 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,223 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000050 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,223 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000051 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,223 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000052 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,223 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1470798127749_8660_01_000053 Container Transitioned from ALLOCATED to ACQUIRED
2016-08-12 02:06:51,223 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler
java.lang.ArithmeticException: / by zero
        at org.apache.hadoop.yarn.util.resource.DominantResourceCalculator.computeAvailableContainers(DominantResourceCalculator.java:101)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
        at java.lang.Thread.run(Thread.java:745)
2016-08-12 02:06:51,224 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
2016-08-12 02:06:51,230 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@f04:8088
2016-08-12 02:06:51,331 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032
2016-08-12 02:06:51,332 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2016-08-12 02:06:51,332 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032
2016-08-12 02:06:51,333 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033
2016-08-12 02:06:51,334 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8033
2016-08-12 02:06:51,335 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
{code}


[~huizane] Have you solved the problem? Thanks


> RM dies because of divide by zero
> ---------------------------------
>
>                 Key: YARN-3001
>                 URL: https://issues.apache.org/jira/browse/YARN-3001
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.1
>            Reporter: hoelog
>            Assignee: Rohith Sharma K S
>
> RM dies because of divide by zero exception.
> {code}
> 2014-12-31 21:27:05,022 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler
> java.lang.ArithmeticException: / by zero
>     at org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
>     at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
>     at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
>     at java.lang.Thread.run(Thread.java:745)
> 2014-12-31 21:27:05,023 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org