You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "pnowojski (via GitHub)" <gi...@apache.org> on 2024/03/12 16:22:48 UTC

[PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

pnowojski opened a new pull request, #24487:
URL: https://github.com/apache/flink/pull/24487

   Unexpected error can be for example NPE
   
   ## Verifying this change
   
   This
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

Posted by "rkhachatryan (via GitHub)" <gi...@apache.org>.
rkhachatryan commented on code in PR #24487:
URL: https://github.com/apache/flink/pull/24487#discussion_r1521803705


##########
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java:
##########
@@ -1046,27 +1046,32 @@ private void onTriggerFailure(
             CheckpointProperties checkpointProperties,
             Throwable throwable) {
         // beautify the stack trace a bit
-        throwable = ExceptionUtils.stripCompletionException(throwable);
-
         try {
-            coordinatorsToCheckpoint.forEach(
-                    OperatorCoordinatorCheckpointContext::abortCurrentTriggering);
+            throwable = ExceptionUtils.stripCompletionException(throwable);
 
-            final CheckpointException cause =
-                    getCheckpointException(
-                            CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable);
+            try {
+                coordinatorsToCheckpoint.forEach(
+                        OperatorCoordinatorCheckpointContext::abortCurrentTriggering);
 
-            if (checkpoint != null && !checkpoint.isDisposed()) {
-                synchronized (lock) {
-                    abortPendingCheckpoint(checkpoint, cause);
+                final CheckpointException cause =
+                        getCheckpointException(
+                                CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable);
+
+                if (checkpoint != null && !checkpoint.isDisposed()) {
+                    synchronized (lock) {
+                        abortPendingCheckpoint(checkpoint, cause);
+                    }
+                } else {
+                    failureManager.handleCheckpointException(
+                            checkpoint, checkpointProperties, cause, null, job, null, statsTracker);
                 }
-            } else {
-                failureManager.handleCheckpointException(
-                        checkpoint, checkpointProperties, cause, null, job, null, statsTracker);
+            } finally {
+                isTriggering = false;
+                executeQueuedRequest();
             }
-        } finally {
-            isTriggering = false;
-            executeQueuedRequest();
+        } catch (Throwable secondThrowable) {

Review Comment:
   Can't we have just one try/catch block?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

Posted by "flinkbot (via GitHub)" <gi...@apache.org>.
flinkbot commented on PR #24487:
URL: https://github.com/apache/flink/pull/24487#issuecomment-1992068365

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d07a37b06fc847c1f2c6ce148a918c2490f2490c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d07a37b06fc847c1f2c6ce148a918c2490f2490c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d07a37b06fc847c1f2c6ce148a918c2490f2490c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

Posted by "pnowojski (via GitHub)" <gi...@apache.org>.
pnowojski commented on PR #24487:
URL: https://github.com/apache/flink/pull/24487#issuecomment-1999835148

   Merging. Builds are failing due to unrelated test instabilities/bugs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

Posted by "rkhachatryan (via GitHub)" <gi...@apache.org>.
rkhachatryan commented on code in PR #24487:
URL: https://github.com/apache/flink/pull/24487#discussion_r1521803705


##########
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java:
##########
@@ -1046,27 +1046,32 @@ private void onTriggerFailure(
             CheckpointProperties checkpointProperties,
             Throwable throwable) {
         // beautify the stack trace a bit
-        throwable = ExceptionUtils.stripCompletionException(throwable);
-
         try {
-            coordinatorsToCheckpoint.forEach(
-                    OperatorCoordinatorCheckpointContext::abortCurrentTriggering);
+            throwable = ExceptionUtils.stripCompletionException(throwable);
 
-            final CheckpointException cause =
-                    getCheckpointException(
-                            CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable);
+            try {
+                coordinatorsToCheckpoint.forEach(
+                        OperatorCoordinatorCheckpointContext::abortCurrentTriggering);
 
-            if (checkpoint != null && !checkpoint.isDisposed()) {
-                synchronized (lock) {
-                    abortPendingCheckpoint(checkpoint, cause);
+                final CheckpointException cause =
+                        getCheckpointException(
+                                CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE, throwable);
+
+                if (checkpoint != null && !checkpoint.isDisposed()) {
+                    synchronized (lock) {
+                        abortPendingCheckpoint(checkpoint, cause);
+                    }
+                } else {
+                    failureManager.handleCheckpointException(
+                            checkpoint, checkpointProperties, cause, null, job, null, statsTracker);
                 }
-            } else {
-                failureManager.handleCheckpointException(
-                        checkpoint, checkpointProperties, cause, null, job, null, statsTracker);
+            } finally {
+                isTriggering = false;
+                executeQueuedRequest();
             }
-        } finally {
-            isTriggering = false;
-            executeQueuedRequest();
+        } catch (Throwable secondThrowable) {

Review Comment:
   Can't we have just one try/catch block?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [hotfix] In case of unexpected errors do not loose the primary failur [flink]

Posted by "pnowojski (via GitHub)" <gi...@apache.org>.
pnowojski merged PR #24487:
URL: https://github.com/apache/flink/pull/24487


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org