You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/07/18 23:57:54 UTC

[GitHub] [flink] rkhachatryan opened a new pull request, #20300: [WIP][FLINK-28597][state] Discard non-initial checkpoints without a delay

rkhachatryan opened a new pull request, #20300:
URL: https://github.com/apache/flink/pull/20300

   ## What is the purpose of the change
   
   ```
   Newly created checkpoints can be discarded right after subsumption. But the
   initial checkpoint needs to be kept until all of its private AND shared state
   is not in use.
   Keeping any checkpoint for longer leaves its folder undeleted on job
   cancellation (and also on crash or JM failover).
   
   This change limits createdByCheckpointID tracking to only initial checkpoints.
   ```
   
   ## Verifying this change
   
   **TBD**
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: yes
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] rkhachatryan commented on pull request #20300: [WIP][FLINK-28597][state] Discard non-initial checkpoints without a delay

Posted by GitBox <gi...@apache.org>.
rkhachatryan commented on PR #20300:
URL: https://github.com/apache/flink/pull/20300#issuecomment-1192522229

   Superceded by #20313


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] fredia commented on a diff in pull request #20300: [WIP][FLINK-28597][state] Discard non-initial checkpoints without a delay

Posted by GitBox <gi...@apache.org>.
fredia commented on code in PR #20300:
URL: https://github.com/apache/flink/pull/20300#discussion_r924011917


##########
flink-runtime/src/main/java/org/apache/flink/runtime/state/SharedStateRegistryImpl.java:
##########
@@ -192,6 +208,8 @@ public void registerAll(
     @Override
     public void registerAllAfterRestored(CompletedCheckpoint checkpoint, RestoreMode mode) {

Review Comment:
   This method is called by `CompletedCheckpoint#registerSharedStatesAfterRestored()`, and `registerSharedStatesAfterRestored()` is called by `CheckpointCoordinator#restoreSavepoint()` and `DEFAULT_FACTORY`.
    I'm not sure if the call in `DEFAULT_FACTORY` has other effects. If a job failover and restarts, does `highestRestoredCheckpointID` update?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] flinkbot commented on pull request #20300: [WIP][FLINK-28597][state] Discard non-initial checkpoints without a delay

Posted by GitBox <gi...@apache.org>.
flinkbot commented on PR #20300:
URL: https://github.com/apache/flink/pull/20300#issuecomment-1188454872

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "acc4f2e262947cdda78eea45aaa1f1bca976e16b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "acc4f2e262947cdda78eea45aaa1f1bca976e16b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * acc4f2e262947cdda78eea45aaa1f1bca976e16b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] rkhachatryan closed pull request #20300: [WIP][FLINK-28597][state] Discard non-initial checkpoints without a delay

Posted by GitBox <gi...@apache.org>.
rkhachatryan closed pull request #20300: [WIP][FLINK-28597][state] Discard non-initial checkpoints without a delay
URL: https://github.com/apache/flink/pull/20300


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] rkhachatryan commented on a diff in pull request #20300: [WIP][FLINK-28597][state] Discard non-initial checkpoints without a delay

Posted by GitBox <gi...@apache.org>.
rkhachatryan commented on code in PR #20300:
URL: https://github.com/apache/flink/pull/20300#discussion_r924251532


##########
flink-runtime/src/main/java/org/apache/flink/runtime/state/SharedStateRegistryImpl.java:
##########
@@ -192,6 +208,8 @@ public void registerAll(
     @Override
     public void registerAllAfterRestored(CompletedCheckpoint checkpoint, RestoreMode mode) {

Review Comment:
   Good question.
   Yes, `highestRestoredCheckpointID` will be updated on failover (potentially to a higher checkpoint, but that would not violate correctness).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] fredia commented on a diff in pull request #20300: [WIP][FLINK-28597][state] Discard non-initial checkpoints without a delay

Posted by GitBox <gi...@apache.org>.
fredia commented on code in PR #20300:
URL: https://github.com/apache/flink/pull/20300#discussion_r924330950


##########
flink-runtime/src/main/java/org/apache/flink/runtime/state/SharedStateRegistryImpl.java:
##########
@@ -192,6 +208,8 @@ public void registerAll(
     @Override
     public void registerAllAfterRestored(CompletedCheckpoint checkpoint, RestoreMode mode) {

Review Comment:
   Yes, that would not violate correctness, but that would delay the deletion of more checkpoints(the completed checkpoints which are stored in ZK) folders.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org