You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/06/06 14:21:55 UTC

[GitHub] [flink] Aitozi opened a new pull request, #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Aitozi opened a new pull request, #19886:
URL: https://github.com/apache/flink/pull/19886

   …larativeSlotManager
   
   
   ## What is the purpose of the change
   
   This PR is meant to introduce the delay for resource requirements in DeclarativeSlotManager. It is the preceding step to solve the https://github.com/apache/flink/pull/19840 
   
   
   ## Brief change log
   
   - Make the delay option configurable (but exclude from the doc). I think it's an internal optimization
   - Use the same mechanism in DeclarativeSlotManager with FineGrainedSlotManager.
   
   
   ## Verifying this change
   
   A test is added to verify it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Aitozi commented on a diff in pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
Aitozi commented on code in PR #19886:
URL: https://github.com/apache/flink/pull/19886#discussion_r890717053


##########
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManager.java:
##########
@@ -407,13 +420,39 @@ public void freeSlot(SlotID slotId, AllocationID allocationId) {
         LOG.debug("Freeing slot {}.", slotId);
 
         slotTracker.notifyFree(slotId);
-        checkResourceRequirements();
+        checkResourceRequirementsWithDelay();
     }
 
     // ---------------------------------------------------------------------------------------------
     // Requirement matching
     // ---------------------------------------------------------------------------------------------
 
+    /**
+     * Depending on the implementation of {@link ResourceAllocationStrategy}, checking resource
+     * requirements and potentially making a re-allocation can be heavy. In order to cover more
+     * changes with each check, thus reduce the frequency of unnecessary re-allocations, the checks
+     * are performed with a slight delay.
+     */
+    private void checkResourceRequirementsWithDelay() {
+        if (requirementsCheckDelay.toMillis() <= 0) {

Review Comment:
   Yes, will fix it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Aitozi commented on pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
Aitozi commented on PR #19886:
URL: https://github.com/apache/flink/pull/19886#issuecomment-1152229224

   Addressed your last comments, please take a look again @KarmaGYZ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] flinkbot commented on pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
flinkbot commented on PR #19886:
URL: https://github.com/apache/flink/pull/19886#issuecomment-1147511731

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8fc9b64d16b219f1d0b0c6a533ba55e93d40a13b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8fc9b64d16b219f1d0b0c6a533ba55e93d40a13b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8fc9b64d16b219f1d0b0c6a533ba55e93d40a13b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Aitozi commented on a diff in pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
Aitozi commented on code in PR #19886:
URL: https://github.com/apache/flink/pull/19886#discussion_r890724533


##########
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManagerTest.java:
##########
@@ -1424,6 +1428,37 @@ public void testReclaimInactiveSlotsOnClearRequirements() throws Exception {
         }
     }
 
+    @Test
+    public void testProcessResourceRequirementsWithDelay() throws Exception {

Review Comment:
   Verified by new added
   
   ```
   Assertions.assertEquals(1, allocatedResourceCounter.get());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Aitozi commented on a diff in pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
Aitozi commented on code in PR #19886:
URL: https://github.com/apache/flink/pull/19886#discussion_r890716617


##########
flink-core/src/main/java/org/apache/flink/configuration/ResourceManagerOptions.java:
##########
@@ -142,6 +142,13 @@ public class ResourceManagerOptions {
                                     + START_WORKER_MAX_FAILURE_RATE.key()
                                     + "') is reached.");
 
+    @Documentation.ExcludeFromDocumentation

Review Comment:
   It is inspired by the comments in the `ResourceManagerRuntimeServices` to make this configurable. But I think it do not have the requirements to change the default value now. But I can remove the annotation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Aitozi commented on pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
Aitozi commented on PR #19886:
URL: https://github.com/apache/flink/pull/19886#issuecomment-1149417828

   I have addressed the comments, please take another look @KarmaGYZ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Aitozi commented on a diff in pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
Aitozi commented on code in PR #19886:
URL: https://github.com/apache/flink/pull/19886#discussion_r890728118


##########
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManagerTest.java:
##########
@@ -1424,6 +1428,37 @@ public void testReclaimInactiveSlotsOnClearRequirements() throws Exception {
         }
     }
 
+    @Test
+    public void testProcessResourceRequirementsWithDelay() throws Exception {

Review Comment:
   Yes, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] KarmaGYZ commented on a diff in pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
KarmaGYZ commented on code in PR #19886:
URL: https://github.com/apache/flink/pull/19886#discussion_r890722048


##########
flink-core/src/main/java/org/apache/flink/configuration/ResourceManagerOptions.java:
##########
@@ -142,6 +142,13 @@ public class ResourceManagerOptions {
                                     + START_WORKER_MAX_FAILURE_RATE.key()
                                     + "') is reached.");
 
+    @Documentation.ExcludeFromDocumentation

Review Comment:
   ```suggestion
       @Documentation.ExcludeFromDocumentation(
               "This is an expert option, that we do not want to expose in the documentation")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] KarmaGYZ commented on a diff in pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
KarmaGYZ commented on code in PR #19886:
URL: https://github.com/apache/flink/pull/19886#discussion_r893008091


##########
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManagerRuntimeServices.java:
##########
@@ -94,7 +90,7 @@ private static SlotManager createSlotManager(
                             SlotManagerUtils.generateTaskManagerTotalResourceProfile(
                                     slotManagerConfiguration.getDefaultWorkerResourceSpec()),
                             slotManagerConfiguration.getNumSlotsPerWorker()),
-                    Time.milliseconds(REQUIREMENTS_CHECK_DELAY_MS));
+                    slotManagerConfiguration.getRequirementCheckDelay());

Review Comment:
   Actually, we do not need to treat it as a separate param.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] KarmaGYZ commented on a diff in pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
KarmaGYZ commented on code in PR #19886:
URL: https://github.com/apache/flink/pull/19886#discussion_r890726474


##########
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManagerTest.java:
##########
@@ -1424,6 +1428,37 @@ public void testReclaimInactiveSlotsOnClearRequirements() throws Exception {
         }
     }
 
+    @Test
+    public void testProcessResourceRequirementsWithDelay() throws Exception {

Review Comment:
   I think we can process another resource requirement before manually triggering the scheduled task. Then the number of the pending task should be still equal to 1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Aitozi commented on pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
Aitozi commented on PR #19886:
URL: https://github.com/apache/flink/pull/19886#issuecomment-1151921104

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Aitozi commented on a diff in pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
Aitozi commented on code in PR #19886:
URL: https://github.com/apache/flink/pull/19886#discussion_r893036756


##########
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManagerRuntimeServices.java:
##########
@@ -94,7 +90,7 @@ private static SlotManager createSlotManager(
                             SlotManagerUtils.generateTaskManagerTotalResourceProfile(
                                     slotManagerConfiguration.getDefaultWorkerResourceSpec()),
                             slotManagerConfiguration.getNumSlotsPerWorker()),
-                    Time.milliseconds(REQUIREMENTS_CHECK_DELAY_MS));
+                    slotManagerConfiguration.getRequirementCheckDelay());

Review Comment:
   Nice catch, fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Aitozi commented on a diff in pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
Aitozi commented on code in PR #19886:
URL: https://github.com/apache/flink/pull/19886#discussion_r890723542


##########
flink-core/src/main/java/org/apache/flink/configuration/ResourceManagerOptions.java:
##########
@@ -142,6 +142,13 @@ public class ResourceManagerOptions {
                                     + START_WORKER_MAX_FAILURE_RATE.key()
                                     + "') is reached.");
 
+    @Documentation.ExcludeFromDocumentation

Review Comment:
   thanks for your hint 👍🏻



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] KarmaGYZ commented on a diff in pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
KarmaGYZ commented on code in PR #19886:
URL: https://github.com/apache/flink/pull/19886#discussion_r890703027


##########
flink-core/src/main/java/org/apache/flink/configuration/ResourceManagerOptions.java:
##########
@@ -142,6 +142,13 @@ public class ResourceManagerOptions {
                                     + START_WORKER_MAX_FAILURE_RATE.key()
                                     + "') is reached.");
 
+    @Documentation.ExcludeFromDocumentation

Review Comment:
   Why do you introduce this config option? Do you mean to allow users to configure it? If so, why do you exclude it from the documentation?



##########
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManager.java:
##########
@@ -407,13 +420,39 @@ public void freeSlot(SlotID slotId, AllocationID allocationId) {
         LOG.debug("Freeing slot {}.", slotId);
 
         slotTracker.notifyFree(slotId);
-        checkResourceRequirements();
+        checkResourceRequirementsWithDelay();
     }
 
     // ---------------------------------------------------------------------------------------------
     // Requirement matching
     // ---------------------------------------------------------------------------------------------
 
+    /**
+     * Depending on the implementation of {@link ResourceAllocationStrategy}, checking resource
+     * requirements and potentially making a re-allocation can be heavy. In order to cover more
+     * changes with each check, thus reduce the frequency of unnecessary re-allocations, the checks
+     * are performed with a slight delay.
+     */
+    private void checkResourceRequirementsWithDelay() {
+        if (requirementsCheckDelay.toMillis() <= 0) {

Review Comment:
   If we allow users to configure the delay period, we should also check it in `FineGrainedSlotManager`



##########
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManagerTest.java:
##########
@@ -1347,7 +1351,7 @@ public void testAllocationUpdatesIgnoredIfSlotMarkedAsPendingForOtherJob() throw
                         .setSlotTracker(slotTracker)
                         .buildAndStart(
                                 ResourceManagerId.generate(),
-                                ComponentMainThreadExecutorServiceAdapter.forMainThread(),
+                                Executors.directExecutor(),

Review Comment:
   Could you help me to understand why we need this change?



##########
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManagerTest.java:
##########
@@ -1424,6 +1428,37 @@ public void testReclaimInactiveSlotsOnClearRequirements() throws Exception {
         }
     }
 
+    @Test
+    public void testProcessResourceRequirementsWithDelay() throws Exception {

Review Comment:
   We'd better also test that the check method will be triggered only once in one delay period.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Aitozi commented on a diff in pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
Aitozi commented on code in PR #19886:
URL: https://github.com/apache/flink/pull/19886#discussion_r890718047


##########
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManagerTest.java:
##########
@@ -1347,7 +1351,7 @@ public void testAllocationUpdatesIgnoredIfSlotMarkedAsPendingForOtherJob() throw
                         .setSlotTracker(slotTracker)
                         .buildAndStart(
                                 ResourceManagerId.generate(),
-                                ComponentMainThreadExecutorServiceAdapter.forMainThread(),
+                                Executors.directExecutor(),

Review Comment:
   It's an unused change now, reverted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] KarmaGYZ closed pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…

Posted by GitBox <gi...@apache.org>.
KarmaGYZ closed pull request #19886: [FLINK-27921] Introduce the checkResourceRequirementsWithDelay in Dec…
URL: https://github.com/apache/flink/pull/19886


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org