You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Jacob Park (JIRA)" <ji...@apache.org> on 2018/03/30 00:09:00 UTC

[jira] [Created] (FLINK-9114) Enable user-provided, custom CheckpointRecoveryFactory for non-HA deployments

Jacob Park created FLINK-9114:
---------------------------------

             Summary: Enable user-provided, custom CheckpointRecoveryFactory for non-HA deployments
                 Key: FLINK-9114
                 URL: https://issues.apache.org/jira/browse/FLINK-9114
             Project: Flink
          Issue Type: Improvement
          Components: Configuration, State Backends, Checkpointing
            Reporter: Jacob Park
            Assignee: Jacob Park


When you operate a Flink application that uses externalized checkpoints to S3, it becomes difficult to determine which checkpoint is the latest to recover from. Because S3 provides read-after-write consistency only for PUTS, listing a S3 path is not guaranteed to be consistent, so we do not know what checkpoint to recover from.

The goal of this improvement is to allow users to provide a custom CheckpointRecoveryFactory for non-HA deployments such that we can use this feature to fail checkpoints if we cannot guarantee we will know where a checkpoint will be in S3, and co-publish checkpoint metadata to a strongly consistent data store.

I propose the following changes:
 # Modify AbstractNonHaServices and StandaloneHaServices to accept an Executor for the custom CheckpointRecoveryFactory.
 # Create a CheckpointRecoveryFactoryLoader to provide the custom CheckpointRecoveryFactory from configurations.
 # Add new configurations for this feature.

We considered the pluggable StateBackend and potential pluggable HighAvailabilityServices. These were too convoluted to solve our problem, so we would like custom CheckpointRecoveryFactory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)