You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Amithsha (Jira)" <ji...@apache.org> on 2019/08/24 09:38:00 UTC

[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

    [ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914888#comment-16914888 ] 

Amithsha commented on YARN-9596:
--------------------------------

Hi All, i am getting the following Exception in 2.9.0 Is this related to the above issue ????

2019-08-22 23:59:20,180 FATAL event.EventDispatcher (?:?(?)) - Error in handling event type NODE_UPDATE to the Event Dispatcher

java.lang.NullPointerException

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.canAssign(RegularContainerAllocator.java:301)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignOffSwitchContainers(RegularContainerAllocator.java:388)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainersOnNode(RegularContainerAllocator.java:469)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.tryAllocateOnNode(RegularContainerAllocator.java:250)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:819)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:857)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:55)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:868)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1121)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:734)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:558)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1346)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1341)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1430)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1205)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1067)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1472)

 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:151)

 at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)

 at java.lang.Thread.run(Thread.java:745)

 

> QueueMetrics has incorrect metrics when labelled partitions are involved
> ------------------------------------------------------------------------
>
>                 Key: YARN-9596
>                 URL: https://issues.apache.org/jira/browse/YARN-9596
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 2.8.0, 3.3.0
>            Reporter: Muhammad Samir Khan
>            Assignee: Muhammad Samir Khan
>            Priority: Major
>             Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
>         Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-2.8.005.patch, YARN-9596-branch-3.0.004.patch, YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch
>
>
> After YARN-6467, QueueMetrics should only be tracking metrics for the default partition. However, the metrics are incorrect when labelled partitions are involved.
> Steps to reproduce
> ==============
>  # Configure capacity-scheduler.xml with label configuration
>  # Add label "test" to cluster and replace label on node1 to be "test"
>  # Note down "totalMB" at <resourcemanager.webapp.address:port>/ws/v1/cluster/metrics
>  # Start first job on test queue.
>  # Start second job on default queue (does not work if the order of two jobs is swapped).
>  # While the two applications are running, the "totalMB" at <resourcemanager.webapp.address:port>/ws/v1/cluster/metrics will go down by the amount of MB used by the first job (screenshots attached).
> Alternately:
> In TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), add the following line at the end of the test before rm1.close():
> CSQueue rootQueue = cs.getRootQueue();
> assertEquals(10*GB,
>  rootQueue.getMetrics().getAvailableMB() + rootQueue.getMetrics().getAllocatedMB());
> There are two nodes of 10GB each and only one of them have a non-default label. The test will also fail against 20*GB check.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org