You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/06/17 03:21:21 UTC

[GitHub] [flink] KarmaGYZ commented on a diff in pull request #19840: [FLINK-24713][Runtime/Coordination] Support the initial delay for SlotManager to wait fo…

KarmaGYZ commented on code in PR #19840:
URL: https://github.com/apache/flink/pull/19840#discussion_r899686715


##########
flink-core/src/main/java/org/apache/flink/configuration/ResourceManagerOptions.java:
##########
@@ -150,6 +150,18 @@ public class ResourceManagerOptions {
                     .defaultValue(Duration.ofMillis(50))
                     .withDescription("The delay of the resource requirements check.");
 
+    /**
+     * The long delay of requirements check. This is only used for waiting for the previous
+     * TaskManagers registration and will only take effect in the first requirements check.
+     */
+    public static final ConfigOption<Duration> REQUIREMENTS_CHECK_LONG_DELAY =

Review Comment:
   ```suggestion
       @Documentation.Section(Documentation.Sections.EXPERT_SCHEDULING)
       public static final ConfigOption<Duration> REQUIREMENTS_CHECK_LONG_DELAY =
   ```



##########
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java:
##########
@@ -491,10 +504,15 @@ public void freeSlot(SlotID slotId, AllocationID allocationId) {
      * are performed with a slight delay.
      */
     private void checkResourceRequirementsWithDelay() {
-        if (requirementsCheckDelay.toMillis() <= 0) {
-            checkResourceRequirements();
-        } else {
-            if (requirementsCheckFuture == null || requirementsCheckFuture.isDone()) {
+        long delay =
+                isRequirementsCheckLongDelay
+                        ? requirementsCheckLongDelay.toMillis()
+                        : requirementsCheckDelay.toMillis();
+        boolean pendingRequirementsCheck =

Review Comment:
   ```suggestion
           boolean hasPendingRequirementsCheck =
   ```



##########
flink-core/src/main/java/org/apache/flink/configuration/ResourceManagerOptions.java:
##########
@@ -150,6 +150,18 @@ public class ResourceManagerOptions {
                     .defaultValue(Duration.ofMillis(50))
                     .withDescription("The delay of the resource requirements check.");
 
+    /**
+     * The long delay of requirements check. This is only used for waiting for the previous
+     * TaskManagers registration and will only take effect in the first requirements check.
+     */
+    public static final ConfigOption<Duration> REQUIREMENTS_CHECK_LONG_DELAY =
+            ConfigOptions.key("slotmanager.requirement-check.long-delay")
+                    .durationType()
+                    .defaultValue(Duration.ofSeconds(3))

Review Comment:
   I think the default value might be too long. 1s or 500ms might be good enough?



##########
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManager.java:
##########
@@ -434,10 +446,15 @@ public void freeSlot(SlotID slotId, AllocationID allocationId) {
      * are performed with a slight delay.
      */
     private void checkResourceRequirementsWithDelay() {

Review Comment:
   Same as the `FineGrainedSlotManager`.



##########
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/FineGrainedSlotManager.java:
##########
@@ -504,8 +522,11 @@ private void checkResourceRequirementsWithDelay() {
                                             Preconditions.checkNotNull(requirementsCheckFuture)
                                                     .complete(null);
                                         }),
-                        requirementsCheckDelay.toMillis(),
+                        delay,
                         TimeUnit.MILLISECONDS);
+                isRequirementsCheckLongDelay = false;
+            } else {
+                checkResourceRequirements();

Review Comment:
   It seems a bit complicated here. We just need to ensure the delay is no less than zero. Then, the `ScheduledExecutor#schedule` will help us to handle it.



##########
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/AbstractFineGrainedSlotManagerITCase.java:
##########
@@ -752,4 +753,76 @@ public void testAllocationUpdatesIgnoredIfSlotMarkedAsAllocatedAfterSlotReport()
 
         System.setSecurityManager(null);
     }
+
+    @Test
+    public void testRequirementLongDelayOnlyTakeEffectOnce() throws Exception {
+        final CompletableFuture<Void> allocationIdFuture1 = new CompletableFuture<>();
+        final CompletableFuture<Void> allocationIdFuture2 = new CompletableFuture<>();
+        final TaskExecutorGateway taskExecutorGateway1 =
+                new TestingTaskExecutorGatewayBuilder()
+                        .setRequestSlotFunction(
+                                tuple6 -> {
+                                    allocationIdFuture1.complete(null);
+                                    return CompletableFuture.completedFuture(Acknowledge.get());
+                                })
+                        .createTestingTaskExecutorGateway();
+        final TaskExecutorGateway taskExecutorGateway2 =
+                new TestingTaskExecutorGatewayBuilder()
+                        .setRequestSlotFunction(
+                                tuple6 -> {
+                                    allocationIdFuture2.complete(null);
+                                    return CompletableFuture.completedFuture(Acknowledge.get());
+                                })
+                        .createTestingTaskExecutorGateway();
+        final TaskExecutorConnection taskExecutorConnection1 =
+                new TaskExecutorConnection(ResourceID.generate(), taskExecutorGateway1);
+        final TaskExecutorConnection taskExecutorConnection2 =
+                new TaskExecutorConnection(ResourceID.generate(), taskExecutorGateway2);
+        new Context() {
+            {
+                runTest(
+                        () -> {
+                            final CompletableFuture<Boolean> registerTaskManagerFuture1 =
+                                    new CompletableFuture<>();
+                            final CompletableFuture<Boolean> registerTaskManagerFuture2 =
+                                    new CompletableFuture<>();
+                            runInMainThread(
+                                    () -> {
+                                        getSlotManager().enlargeRequirementsCheckDelayOnce();
+                                        getSlotManager()
+                                                .processResourceRequirements(
+                                                        createResourceRequirements(
+                                                                new JobID(),
+                                                                2,
+                                                                DEFAULT_SLOT_RESOURCE_PROFILE));
+                                        registerTaskManagerFuture1.complete(
+                                                getSlotManager()
+                                                        .registerTaskManager(
+                                                                taskExecutorConnection1,
+                                                                new SlotReport(),
+                                                                DEFAULT_SLOT_RESOURCE_PROFILE,
+                                                                DEFAULT_SLOT_RESOURCE_PROFILE));
+                                    });
+                            assertThat(
+                                    assertFutureCompleteAndReturn(registerTaskManagerFuture1),
+                                    is(true));
+                            assertFutureNotComplete(allocationIdFuture1);
+                            allocationIdFuture1.get(200, TimeUnit.MILLISECONDS);

Review Comment:
   It's a bit fragile. Maybe we can enlarge the `requirementCheckLongDelay` to 400ms.



##########
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/AbstractFineGrainedSlotManagerITCase.java:
##########
@@ -752,4 +753,76 @@ public void testAllocationUpdatesIgnoredIfSlotMarkedAsAllocatedAfterSlotReport()
 
         System.setSecurityManager(null);
     }
+
+    @Test
+    public void testRequirementLongDelayOnlyTakeEffectOnce() throws Exception {
+        final CompletableFuture<Void> allocationIdFuture1 = new CompletableFuture<>();
+        final CompletableFuture<Void> allocationIdFuture2 = new CompletableFuture<>();
+        final TaskExecutorGateway taskExecutorGateway1 =
+                new TestingTaskExecutorGatewayBuilder()
+                        .setRequestSlotFunction(
+                                tuple6 -> {
+                                    allocationIdFuture1.complete(null);
+                                    return CompletableFuture.completedFuture(Acknowledge.get());
+                                })
+                        .createTestingTaskExecutorGateway();
+        final TaskExecutorGateway taskExecutorGateway2 =
+                new TestingTaskExecutorGatewayBuilder()
+                        .setRequestSlotFunction(
+                                tuple6 -> {
+                                    allocationIdFuture2.complete(null);
+                                    return CompletableFuture.completedFuture(Acknowledge.get());
+                                })
+                        .createTestingTaskExecutorGateway();
+        final TaskExecutorConnection taskExecutorConnection1 =
+                new TaskExecutorConnection(ResourceID.generate(), taskExecutorGateway1);
+        final TaskExecutorConnection taskExecutorConnection2 =
+                new TaskExecutorConnection(ResourceID.generate(), taskExecutorGateway2);
+        new Context() {
+            {
+                runTest(
+                        () -> {
+                            final CompletableFuture<Boolean> registerTaskManagerFuture1 =
+                                    new CompletableFuture<>();
+                            final CompletableFuture<Boolean> registerTaskManagerFuture2 =
+                                    new CompletableFuture<>();
+                            runInMainThread(
+                                    () -> {
+                                        getSlotManager().enlargeRequirementsCheckDelayOnce();
+                                        getSlotManager()
+                                                .processResourceRequirements(
+                                                        createResourceRequirements(
+                                                                new JobID(),
+                                                                2,
+                                                                DEFAULT_SLOT_RESOURCE_PROFILE));
+                                        registerTaskManagerFuture1.complete(
+                                                getSlotManager()
+                                                        .registerTaskManager(
+                                                                taskExecutorConnection1,
+                                                                new SlotReport(),
+                                                                DEFAULT_SLOT_RESOURCE_PROFILE,
+                                                                DEFAULT_SLOT_RESOURCE_PROFILE));
+                                    });
+                            assertThat(
+                                    assertFutureCompleteAndReturn(registerTaskManagerFuture1),
+                                    is(true));
+                            assertFutureNotComplete(allocationIdFuture1);
+                            allocationIdFuture1.get(200, TimeUnit.MILLISECONDS);

Review Comment:
   And here, we just use `assertFutureCompleteAndReturn`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org