You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2022/09/16 16:48:36 UTC

[GitHub] [hadoop] goiri commented on a diff in pull request #4726: YARN-11191 Global Scheduler refreshQueue cause deadLock

goiri commented on code in PR #4726:
URL: https://github.com/apache/hadoop/pull/4726#discussion_r973210207


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java:
##########
@@ -1384,7 +1385,19 @@ public List<CSQueue> getChildQueues() {
     }
 
   }
-  
+
+  @Override
+  public List<CSQueue> getChildQueuesByTryLock() {
+    try {
+      while (!readLock.tryLock()){

Review Comment:
   Why not just a regular lock()?



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java:
##########
@@ -3026,4 +3030,69 @@ public void testReservedContainerLeakWhenMoveApplication() throws Exception {
     Assert.assertEquals(0, desQueue.getUsedResources().getMemorySize());
     rm1.close();
   }
+  @Test
+  public void testRefreshQueueWithOpenPreemption() throws Exception {
+    CapacitySchedulerConfiguration csConf

Review Comment:
   The line limit is 100 chars so this should fit.



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/preemption/PreemptionManager.java:
##########
@@ -25,10 +25,7 @@
 import org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueue;
 import org.apache.hadoop.yarn.util.resource.Resources;
 
-import java.util.Collections;

Review Comment:
   Avoid.



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java:
##########
@@ -3026,4 +3030,69 @@ public void testReservedContainerLeakWhenMoveApplication() throws Exception {
     Assert.assertEquals(0, desQueue.getUsedResources().getMemorySize());
     rm1.close();
   }
+  @Test
+  public void testRefreshQueueWithOpenPreemption() throws Exception {
+    CapacitySchedulerConfiguration csConf
+            = new CapacitySchedulerConfiguration();
+    csConf.setQueues(CapacitySchedulerConfiguration.ROOT,
+            new String[] {"a"});
+    csConf.setCapacity("root.a", 100);
+    csConf.setMaximumCapacity("root.a", 100);
+    csConf.setUserLimitFactor("root.a", 100);
+
+    YarnConfiguration conf=new YarnConfiguration(csConf);
+    conf.setClass(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class,
+            ResourceScheduler.class);
+    RMNodeLabelsManager mgr=new NullRMNodeLabelsManager();
+    mgr.init(conf);
+    MockRM rm1 = new MockRM(csConf);
+    CapacityScheduler scheduler=(CapacityScheduler) rm1.getResourceScheduler();
+    PreemptionManager preemptionManager = scheduler.getPreemptionManager();;

Review Comment:
   ;;



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java:
##########
@@ -3026,4 +3030,69 @@ public void testReservedContainerLeakWhenMoveApplication() throws Exception {
     Assert.assertEquals(0, desQueue.getUsedResources().getMemorySize());
     rm1.close();
   }
+  @Test
+  public void testRefreshQueueWithOpenPreemption() throws Exception {
+    CapacitySchedulerConfiguration csConf
+            = new CapacitySchedulerConfiguration();
+    csConf.setQueues(CapacitySchedulerConfiguration.ROOT,
+            new String[] {"a"});
+    csConf.setCapacity("root.a", 100);
+    csConf.setMaximumCapacity("root.a", 100);
+    csConf.setUserLimitFactor("root.a", 100);
+
+    YarnConfiguration conf=new YarnConfiguration(csConf);
+    conf.setClass(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class,
+            ResourceScheduler.class);
+    RMNodeLabelsManager mgr=new NullRMNodeLabelsManager();

Review Comment:
   Spaces



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java:
##########
@@ -3026,4 +3030,69 @@ public void testReservedContainerLeakWhenMoveApplication() throws Exception {
     Assert.assertEquals(0, desQueue.getUsedResources().getMemorySize());
     rm1.close();
   }
+  @Test
+  public void testRefreshQueueWithOpenPreemption() throws Exception {
+    CapacitySchedulerConfiguration csConf
+            = new CapacitySchedulerConfiguration();
+    csConf.setQueues(CapacitySchedulerConfiguration.ROOT,
+            new String[] {"a"});
+    csConf.setCapacity("root.a", 100);
+    csConf.setMaximumCapacity("root.a", 100);
+    csConf.setUserLimitFactor("root.a", 100);
+
+    YarnConfiguration conf=new YarnConfiguration(csConf);
+    conf.setClass(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class,

Review Comment:
   1 line



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java:
##########
@@ -3026,4 +3030,69 @@ public void testReservedContainerLeakWhenMoveApplication() throws Exception {
     Assert.assertEquals(0, desQueue.getUsedResources().getMemorySize());
     rm1.close();
   }
+  @Test
+  public void testRefreshQueueWithOpenPreemption() throws Exception {
+    CapacitySchedulerConfiguration csConf
+            = new CapacitySchedulerConfiguration();
+    csConf.setQueues(CapacitySchedulerConfiguration.ROOT,
+            new String[] {"a"});
+    csConf.setCapacity("root.a", 100);
+    csConf.setMaximumCapacity("root.a", 100);
+    csConf.setUserLimitFactor("root.a", 100);
+
+    YarnConfiguration conf=new YarnConfiguration(csConf);
+    conf.setClass(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class,
+            ResourceScheduler.class);
+    RMNodeLabelsManager mgr=new NullRMNodeLabelsManager();
+    mgr.init(conf);
+    MockRM rm1 = new MockRM(csConf);
+    CapacityScheduler scheduler=(CapacityScheduler) rm1.getResourceScheduler();
+    PreemptionManager preemptionManager = scheduler.getPreemptionManager();;
+    rm1.getRMContext().setNodeLabelManager(mgr);
+    rm1.start();
+
+    LeafQueue srcQueue = (LeafQueue) scheduler.getQueue("a");
+
+    Thread schedulerThread = new Thread(()-> {
+      srcQueue.readLock.lock();
+      try {
+        Thread.sleep(1000 * 15);
+      } catch (InterruptedException e) {
+        e.printStackTrace();
+      }
+      preemptionManager.getKillableContainers("a",
+              srcQueue.getDefaultNodeLabelExpression());
+      srcQueue.readLock.unlock();
+    });
+
+    Thread completeThread = new Thread(() ->{
+      try {
+        Thread.sleep(1000 * 5);
+      } catch (InterruptedException e) {
+        e.printStackTrace();
+      }
+      srcQueue.writeLock.lock();
+      srcQueue.writeLock.unlock();
+    });
+
+    Thread refreshQueueThread = new Thread(()->{
+      preemptionManager.getWriteLock().lock();
+      try {
+        Thread.sleep(1000 * 10);

Review Comment:
   Spaces



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java:
##########
@@ -3026,4 +3030,69 @@ public void testReservedContainerLeakWhenMoveApplication() throws Exception {
     Assert.assertEquals(0, desQueue.getUsedResources().getMemorySize());
     rm1.close();
   }
+  @Test
+  public void testRefreshQueueWithOpenPreemption() throws Exception {
+    CapacitySchedulerConfiguration csConf
+            = new CapacitySchedulerConfiguration();
+    csConf.setQueues(CapacitySchedulerConfiguration.ROOT,
+            new String[] {"a"});
+    csConf.setCapacity("root.a", 100);
+    csConf.setMaximumCapacity("root.a", 100);
+    csConf.setUserLimitFactor("root.a", 100);
+
+    YarnConfiguration conf=new YarnConfiguration(csConf);
+    conf.setClass(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class,
+            ResourceScheduler.class);
+    RMNodeLabelsManager mgr=new NullRMNodeLabelsManager();
+    mgr.init(conf);
+    MockRM rm1 = new MockRM(csConf);
+    CapacityScheduler scheduler=(CapacityScheduler) rm1.getResourceScheduler();
+    PreemptionManager preemptionManager = scheduler.getPreemptionManager();;
+    rm1.getRMContext().setNodeLabelManager(mgr);
+    rm1.start();
+
+    LeafQueue srcQueue = (LeafQueue) scheduler.getQueue("a");
+
+    Thread schedulerThread = new Thread(()-> {
+      srcQueue.readLock.lock();
+      try {
+        Thread.sleep(1000 * 15);
+      } catch (InterruptedException e) {
+        e.printStackTrace();
+      }
+      preemptionManager.getKillableContainers("a",

Review Comment:
   1 line



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java:
##########
@@ -3026,4 +3030,69 @@ public void testReservedContainerLeakWhenMoveApplication() throws Exception {
     Assert.assertEquals(0, desQueue.getUsedResources().getMemorySize());
     rm1.close();
   }
+  @Test
+  public void testRefreshQueueWithOpenPreemption() throws Exception {

Review Comment:
   Add a description explaining the locking part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org