You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Henrik (JIRA)" <ji...@apache.org> on 2019/05/01 08:09:00 UTC

[jira] [Created] (FLINK-12381) Without failover (aka "HA") configured, full restarts' checkpointing crashes

Henrik created FLINK-12381:
------------------------------

             Summary: Without failover (aka "HA") configured, full restarts' checkpointing crashes
                 Key: FLINK-12381
                 URL: https://issues.apache.org/jira/browse/FLINK-12381
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
    Affects Versions: 1.8.0
         Environment: Same as FLINK-\{12379, 12377, 12376}
            Reporter: Henrik


{code:java}
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: 'gs://example_bucket/flink/checkpoints/00000000000000000000000000000000/chk-16/_metadata' already exists
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.createChannel(GoogleHadoopOutputStream.java:85)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:74)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:797)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:929)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:910)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:807)
    at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141)
    at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37)
    at org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.<init>(FsCheckpointMetadataOutputStream.java:65)
    at org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.createMetadataOutputStream(FsCheckpointStorageLocation.java:104)
    at org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:259)
    at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:829)
    ... 8 more
{code}
Instead, it should either just overwrite the checkpoint or fail to start the job completely. Partial and undefined failure is not what should happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)