You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Samrat002 (via GitHub)" <gi...@apache.org> on 2023/04/27 18:32:24 UTC

[GitHub] [flink] Samrat002 opened a new pull request, #22495: [FLINK-31080][Flink-runtime]Idle slots are not released due to a mismatch in time between DeclarativeSlotPoolService and SlotSharingSlotAllocator

Samrat002 opened a new pull request, #22495:
URL: https://github.com/apache/flink/pull/22495

   
   ## What is the purpose of the change
   Idle slots are not released due to a mismatch in time between DeclarativeSlotPoolService and SlotSharingSlotAllocator
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / no) no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) no
     - The serializers: (yes / no / don't know) no
     - The runtime per-record code paths (performance sensitive): (yes / no / don't know) no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) no
     - The S3 file system connector: (yes / no / don't know) no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / no) no
     - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Samrat002 commented on a diff in pull request #22495: [FLINK-31080][Flink-runtime]Idle slots are not released due to a mismatch in time between DeclarativeSlotPoolService and SlotSharingSlotAllocator

Posted by "Samrat002 (via GitHub)" <gi...@apache.org>.
Samrat002 commented on code in PR #22495:
URL: https://github.com/apache/flink/pull/22495#discussion_r1179966813


##########
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/DeclarativeSlotPoolService.java:
##########
@@ -134,7 +139,25 @@ public final void start(
      *
      * @param componentMainThreadExecutor componentMainThreadExecutor used by this slot pool service
      */
-    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {}
+    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {
+        this.componentMainThreadExecutor = componentMainThreadExecutor;
+
+        componentMainThreadExecutor.schedule(
+                this::checkIdleSlotTimeout,
+                idleSlotTimeout.toMilliseconds(),
+                TimeUnit.MILLISECONDS);

Review Comment:
   <img width="1414" alt="Screenshot 2023-04-28 at 11 39 33 AM" src="https://user-images.githubusercontent.com/40290566/235068090-96b56fa8-ac5c-4d48-91e0-96cf4c52d269.png">
   
   I am trying to open to get more understanding  [FLINK-32399](https://issues.apache.org/jira/browse/FLINK-32399). Is this issue removed or need permission ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Samrat002 commented on a diff in pull request #22495: [FLINK-31080][Flink-runtime]Idle slots are not released due to a mismatch in time between DeclarativeSlotPoolService and SlotSharingSlotAllocator

Posted by "Samrat002 (via GitHub)" <gi...@apache.org>.
Samrat002 commented on code in PR #22495:
URL: https://github.com/apache/flink/pull/22495#discussion_r1179966813


##########
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/DeclarativeSlotPoolService.java:
##########
@@ -134,7 +139,25 @@ public final void start(
      *
      * @param componentMainThreadExecutor componentMainThreadExecutor used by this slot pool service
      */
-    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {}
+    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {
+        this.componentMainThreadExecutor = componentMainThreadExecutor;
+
+        componentMainThreadExecutor.schedule(
+                this::checkIdleSlotTimeout,
+                idleSlotTimeout.toMilliseconds(),
+                TimeUnit.MILLISECONDS);

Review Comment:
   <img width="1414" alt="Screenshot 2023-04-28 at 11 39 33 AM" src="https://user-images.githubusercontent.com/40290566/235068090-96b56fa8-ac5c-4d48-91e0-96cf4c52d269.png">
   
   I am trying to open  [FLINK-32399](https://issues.apache.org/jira/browse/FLINK-32399) and understand the change but unfortunately not able to view anything. Is this issue removed or need permission ?
   
   @reswqa can you help here please 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] reswqa commented on a diff in pull request #22495: [FLINK-31080][Flink-runtime]Idle slots are not released due to a mismatch in time between DeclarativeSlotPoolService and SlotSharingSlotAllocator

Posted by "reswqa (via GitHub)" <gi...@apache.org>.
reswqa commented on code in PR #22495:
URL: https://github.com/apache/flink/pull/22495#discussion_r1179978366


##########
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/DeclarativeSlotPoolService.java:
##########
@@ -134,7 +139,25 @@ public final void start(
      *
      * @param componentMainThreadExecutor componentMainThreadExecutor used by this slot pool service
      */
-    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {}
+    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {
+        this.componentMainThreadExecutor = componentMainThreadExecutor;
+
+        componentMainThreadExecutor.schedule(
+                this::checkIdleSlotTimeout,
+                idleSlotTimeout.toMilliseconds(),
+                TimeUnit.MILLISECONDS);

Review Comment:
   Sorry, my mistake. It should be [FLINK-31399]



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] flinkbot commented on pull request #22495: [FLINK-31080][Flink-runtime]Idle slots are not released due to a mismatch in time between DeclarativeSlotPoolService and SlotSharingSlotAllocator

Posted by "flinkbot (via GitHub)" <gi...@apache.org>.
flinkbot commented on PR #22495:
URL: https://github.com/apache/flink/pull/22495#issuecomment-1526151788

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b64851ad2e4534020d637e26e1b14a61db38e555",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b64851ad2e4534020d637e26e1b14a61db38e555",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b64851ad2e4534020d637e26e1b14a61db38e555 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Samrat002 commented on a diff in pull request #22495: [FLINK-31080][Flink-runtime]Idle slots are not released due to a mismatch in time between DeclarativeSlotPoolService and SlotSharingSlotAllocator

Posted by "Samrat002 (via GitHub)" <gi...@apache.org>.
Samrat002 commented on code in PR #22495:
URL: https://github.com/apache/flink/pull/22495#discussion_r1179967566


##########
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/DeclarativeSlotPoolService.java:
##########
@@ -134,7 +139,25 @@ public final void start(
      *
      * @param componentMainThreadExecutor componentMainThreadExecutor used by this slot pool service
      */
-    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {}
+    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {

Review Comment:
   yes make sense ! making changes 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Samrat002 commented on a diff in pull request #22495: [FLINK-31080][Flink-runtime]Idle slots are not released due to a mismatch in time between DeclarativeSlotPoolService and SlotSharingSlotAllocator

Posted by "Samrat002 (via GitHub)" <gi...@apache.org>.
Samrat002 commented on code in PR #22495:
URL: https://github.com/apache/flink/pull/22495#discussion_r1179966813


##########
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/DeclarativeSlotPoolService.java:
##########
@@ -134,7 +139,25 @@ public final void start(
      *
      * @param componentMainThreadExecutor componentMainThreadExecutor used by this slot pool service
      */
-    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {}
+    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {
+        this.componentMainThreadExecutor = componentMainThreadExecutor;
+
+        componentMainThreadExecutor.schedule(
+                this::checkIdleSlotTimeout,
+                idleSlotTimeout.toMilliseconds(),
+                TimeUnit.MILLISECONDS);

Review Comment:
   <img width="1414" alt="Screenshot 2023-04-28 at 11 39 33 AM" src="https://user-images.githubusercontent.com/40290566/235068090-96b56fa8-ac5c-4d48-91e0-96cf4c52d269.png">
   
   I am trying to open to get more understanding  [FLINK-32399](https://issues.apache.org/jira/browse/FLINK-32399). Is this issue removed ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Samrat002 commented on pull request #22495: [FLINK-31080][Flink-runtime]Idle slots are not released due to a mismatch in time between DeclarativeSlotPoolService and SlotSharingSlotAllocator

Posted by "Samrat002 (via GitHub)" <gi...@apache.org>.
Samrat002 commented on PR #22495:
URL: https://github.com/apache/flink/pull/22495#issuecomment-1536278946

   Thank you @dmvk  for sharing detailed insights on the `DeclarativeSlotPool` and probable risk on the approach on this patch.
   
   > Another option would be exposing the shared clock instance to the AdaptiveScheduler and SlotSharingSlotAllocator, but this might get out of hand quickly.
   
   I can see we can get clock instances in AdaptiveScheduler class. but you mentioned about `this might get out of hand quickly`. [request] If you can elaborate more for me to understand better that will be really helpful for me to understand. 
   
   Thank you 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] reswqa commented on a diff in pull request #22495: [FLINK-31080][Flink-runtime]Idle slots are not released due to a mismatch in time between DeclarativeSlotPoolService and SlotSharingSlotAllocator

Posted by "reswqa (via GitHub)" <gi...@apache.org>.
reswqa commented on code in PR #22495:
URL: https://github.com/apache/flink/pull/22495#discussion_r1179914378


##########
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/DeclarativeSlotPoolService.java:
##########
@@ -134,7 +139,25 @@ public final void start(
      *
      * @param componentMainThreadExecutor componentMainThreadExecutor used by this slot pool service
      */
-    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {}
+    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {
+        this.componentMainThreadExecutor = componentMainThreadExecutor;
+
+        componentMainThreadExecutor.schedule(
+                this::checkIdleSlotTimeout,
+                idleSlotTimeout.toMilliseconds(),
+                TimeUnit.MILLISECONDS);

Review Comment:
   IIUC, you want to move the idle slot check logic from `AdaptiveScheduler` to `DeclarativeSlotPoolService`. But you didn't remove the code there, this is what confuses me. And It seems that this change is not completely aligned with FLINK-32399, is this safe?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] reswqa commented on a diff in pull request #22495: [FLINK-31080][Flink-runtime]Idle slots are not released due to a mismatch in time between DeclarativeSlotPoolService and SlotSharingSlotAllocator

Posted by "reswqa (via GitHub)" <gi...@apache.org>.
reswqa commented on code in PR #22495:
URL: https://github.com/apache/flink/pull/22495#discussion_r1179915692


##########
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/DeclarativeSlotPoolService.java:
##########
@@ -134,7 +139,25 @@ public final void start(
      *
      * @param componentMainThreadExecutor componentMainThreadExecutor used by this slot pool service
      */
-    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {}
+    protected void onStart(ComponentMainThreadExecutor componentMainThreadExecutor) {

Review Comment:
   This check logic already exists in its subclass(i.e. `DeclarativeSlotPoolBridge`), we'd better make some changes to the subclass to reduce duplication of code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org