You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/03/02 16:52:00 UTC
[jira] [Commented] (FLINK-8807) ZookeeperCompleted checkpoint store
can get stuck in infinite loop
[ https://issues.apache.org/jira/browse/FLINK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383803#comment-16383803 ]
ASF GitHub Bot commented on FLINK-8807:
---------------------------------------
GitHub user aljoscha opened a pull request:
https://github.com/apache/flink/pull/5623
[FLINK-8807] Fix ZookeeperCompleted checkpoint store can get stuck in infinite loop
Before, CompletedCheckpoint did not have proper equals()/hashCode(),
which meant that the fixpoint condition in
ZooKeeperCompletedCheckpointStore would never hold if at least on
checkpoint became unreadable.
This adds proper equals()/hashCode() to CompletedCheckpoint and extends
the test to properly create new CompletedCheckpoints. Before, we were
reusing the same CompletedCheckpoint instances, meaning that
Objects.equals()/hashCode() would make the test succeed.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/aljoscha/flink jira-8807-zookeeper-fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5623.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5623
----
commit 777ddb57ee72d200d1312dc8e6dfdb52af6b9950
Author: Aljoscha Krettek <al...@...>
Date: 2018-03-02T16:46:56Z
[FLINK-8807] Fix ZookeeperCompleted checkpoint store can get stuck in infinite loop
Before, CompletedCheckpoint did not have proper equals()/hashCode(),
which meant that the fixpoint condition in
ZooKeeperCompletedCheckpointStore would never hold if at least on
checkpoint became unreadable.
This adds proper equals()/hashCode() to CompletedCheckpoint and extends
the test to properly create new CompletedCheckpoints. Before, we were
reusing the same CompletedCheckpoint instances, meaning that
Objects.equals()/hashCode() would make the test succeed.
----
> ZookeeperCompleted checkpoint store can get stuck in infinite loop
> ------------------------------------------------------------------
>
> Key: FLINK-8807
> URL: https://issues.apache.org/jira/browse/FLINK-8807
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Affects Versions: 1.5.0
> Reporter: Aljoscha Krettek
> Priority: Blocker
> Fix For: 1.5.0
>
>
> This code: https://github.com/apache/flink/blob/9071e3befb8c279f73c3094c9f6bddc0e7cce9e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L201 can be stuck forever if at least one checkpoint is not readable because {{CompletedCheckpoint}} does not have a proper {{equals()}}/{{hashCode()}} anymore.
> We have to fix this and also add a unit test that verifies the loop still works if we make one snapshot unreadable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)