You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/05/13 06:56:24 UTC

[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #200: [FLINK-27495] Observe last savepoint status directly from cluster

gyfora commented on code in PR #200:
URL: https://github.com/apache/flink-kubernetes-operator/pull/200#discussion_r872058500


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/FlinkService.java:
##########
@@ -528,6 +538,71 @@ public void triggerSavepoint(
         }
     }
 
+    public Optional<Savepoint> getLastCheckpoint(JobID jobId, Configuration conf) throws Exception {
+        try (RestClusterClient<String> clusterClient =
+                (RestClusterClient<String>) getClusterClient(conf)) {
+
+            var headers = CustomCheckpointingStatisticsHeaders.getInstance();
+            var params = headers.getUnresolvedMessageParameters();
+            params.jobPathParameter.resolve(jobId);
+
+            CompletableFuture<CheckpointHistoryWrapper> response =
+                    clusterClient.sendRequest(headers, params, EmptyRequestBody.getInstance());
+
+            var checkpoints =
+                    response.get(
+                            configManager
+                                    .getOperatorConfiguration()
+                                    .getFlinkClientTimeout()
+                                    .getSeconds(),
+                            TimeUnit.SECONDS);
+
+            var latestCheckpointOpt =
+                    checkpoints.getHistory().stream()
+                            .filter(
+                                    cp ->
+                                            CheckpointStatsStatus.valueOf(
+                                                            cp.get(
+                                                                            CheckpointStatistics
+                                                                                    .FIELD_NAME_STATUS)
+                                                                    .asText())
+                                                    == CheckpointStatsStatus.COMPLETED)
+                            .filter(
+                                    cp ->
+                                            !cp.get(
+                                                            CheckpointStatistics
+                                                                    .CompletedCheckpointStatistics
+                                                                    .FIELD_NAME_EXTERNAL_PATH)
+                                                    .asText()
+                                                    .equals(
+                                                            NonPersistentMetadataCheckpointStorageLocation
+                                                                    .EXTERNAL_POINTER))
+                            .max(
+                                    Comparator.comparingLong(
+                                            cp ->
+                                                    cp.get(CheckpointStatistics.FIELD_NAME_ID)
+                                                            .asLong()))
+                            .map(
+                                    cp ->
+                                            new Savepoint(

Review Comment:
   I decided to do this because when we record this we are actually in a special scenario. Terminal job, and it's only a checkpoint when the job failed/finished (otherwise it would be a savepoint due to stopwithsavepoint).
   
   I decided to put it in savepointinfo for simplicity from the operator side to avoid introducing new status fields and keeping the logic simple.
   
   The savpoint info in any case is not the real source of truth because anything can happen that prevents us from recording information so I think this is fair. With the savepoint history feature this will be improved further I believe



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org