You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/04/27 09:17:22 UTC

[GitHub] [flink] XComp opened a new pull request, #19591: [FLINK-24491][runtime] Make the job termination wait until the archiving of ExecutionGraphInfo finishes

XComp opened a new pull request, #19591:
URL: https://github.com/apache/flink/pull/19591

   1.14 Backport PR of parent PR #19275
   
   ## What is the purpose of the change
   
   The commit `b3a9dcb` was cherry-picked but required some bigger changes because of the changes we applied to the Dispatcher in 1.15.
   
   ## Verifying this change
   
   * The tests from the parent PR were added but required some refactoring
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Thesharing commented on a diff in pull request #19591: [BP-1.14][FLINK-24491][runtime] Make the job termination wait until the archiving of ExecutionGraphInfo finishes

Posted by GitBox <gi...@apache.org>.
Thesharing commented on code in PR #19591:
URL: https://github.com/apache/flink/pull/19591#discussion_r859595215


##########
flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java:
##########
@@ -869,14 +873,22 @@ protected CleanupJobState jobReachedTerminalState(ExecutionGraphInfo executionGr
                     terminalJobStatus);
         }
 
-        archiveExecutionGraph(executionGraphInfo);
+        writeToExecutionGraphInfoStore(executionGraphInfo);
+
+        if (!terminalJobStatus.isGloballyTerminalState()) {
+            return CompletableFuture.completedFuture(CleanupJobState.LOCAL);
+        }
+
+        // do not create an archive for suspended jobs, as this would eventually lead to
+        // multiple archive attempts which we currently do not support
+        CompletableFuture<Acknowledge> archiveToHistoryServerFuture =
+                archiveExecutionGraphToHistoryServer(executionGraphInfo);
 
-        return terminalJobStatus.isGloballyTerminalState()
-                ? CleanupJobState.GLOBAL
-                : CleanupJobState.LOCAL;
+        return archiveToHistoryServerFuture.thenApplyAsync(

Review Comment:
   Since `archiveExecutionGraphToHistoryServer` returns a CompletableFuture in the main thread, could  we just use `thenApply` instead of `thenApplyAsync` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] XComp commented on pull request #19591: [BP-1.14][FLINK-24491][runtime] Make the job termination wait until the archiving of ExecutionGraphInfo finishes

Posted by GitBox <gi...@apache.org>.
XComp commented on PR #19591:
URL: https://github.com/apache/flink/pull/19591#issuecomment-1110924906

   I had to cherry-pick the hotfixes from PR #19427 as well to make the test wait for the job termination again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] XComp commented on pull request #19591: [BP-1.14][FLINK-24491][runtime] Make the job termination wait until the archiving of ExecutionGraphInfo finishes

Posted by GitBox <gi...@apache.org>.
XComp commented on PR #19591:
URL: https://github.com/apache/flink/pull/19591#issuecomment-1110769400

   @Thesharing I decided to create a 1.14 backport. May you have a look?
   @zentol May you have a look as well? The backport conflict resolution turned out to be a bit more complex due to the changes we applied to the Dispatcher in 1.15


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] XComp commented on pull request #19591: [BP-1.14][FLINK-24491][runtime] Make the job termination wait until the archiving of ExecutionGraphInfo finishes

Posted by GitBox <gi...@apache.org>.
XComp commented on PR #19591:
URL: https://github.com/apache/flink/pull/19591#issuecomment-1111789905

   Force-pushed a reorder of the commits...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] XComp merged pull request #19591: [BP-1.14][FLINK-24491][runtime] Make the job termination wait until the archiving of ExecutionGraphInfo finishes

Posted by GitBox <gi...@apache.org>.
XComp merged PR #19591:
URL: https://github.com/apache/flink/pull/19591


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] XComp commented on a diff in pull request #19591: [BP-1.14][FLINK-24491][runtime] Make the job termination wait until the archiving of ExecutionGraphInfo finishes

Posted by GitBox <gi...@apache.org>.
XComp commented on code in PR #19591:
URL: https://github.com/apache/flink/pull/19591#discussion_r859625127


##########
flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java:
##########
@@ -869,14 +873,22 @@ protected CleanupJobState jobReachedTerminalState(ExecutionGraphInfo executionGr
                     terminalJobStatus);
         }
 
-        archiveExecutionGraph(executionGraphInfo);
+        writeToExecutionGraphInfoStore(executionGraphInfo);
+
+        if (!terminalJobStatus.isGloballyTerminalState()) {
+            return CompletableFuture.completedFuture(CleanupJobState.LOCAL);
+        }
+
+        // do not create an archive for suspended jobs, as this would eventually lead to
+        // multiple archive attempts which we currently do not support
+        CompletableFuture<Acknowledge> archiveToHistoryServerFuture =
+                archiveExecutionGraphToHistoryServer(executionGraphInfo);
 
-        return terminalJobStatus.isGloballyTerminalState()
-                ? CleanupJobState.GLOBAL
-                : CleanupJobState.LOCAL;
+        return archiveToHistoryServerFuture.thenApplyAsync(

Review Comment:
   You have a point. I force-pushed that change...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] flinkbot commented on pull request #19591: [BP-1.14][FLINK-24491][runtime] Make the job termination wait until the archiving of ExecutionGraphInfo finishes

Posted by GitBox <gi...@apache.org>.
flinkbot commented on PR #19591:
URL: https://github.com/apache/flink/pull/19591#issuecomment-1110770011

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8b243bbbea5a6561777b5d95415312806bba0030",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8b243bbbea5a6561777b5d95415312806bba0030",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8b243bbbea5a6561777b5d95415312806bba0030 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org