You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/03/02 16:52:00 UTC

[jira] [Commented] (FLINK-8807) ZookeeperCompleted checkpoint store can get stuck in infinite loop

    [ https://issues.apache.org/jira/browse/FLINK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383803#comment-16383803 ] 

ASF GitHub Bot commented on FLINK-8807:
---------------------------------------

GitHub user aljoscha opened a pull request:

    https://github.com/apache/flink/pull/5623

    [FLINK-8807] Fix ZookeeperCompleted checkpoint store can get stuck in infinite loop

    Before, CompletedCheckpoint did not have proper equals()/hashCode(),
    which meant that the fixpoint condition in
    ZooKeeperCompletedCheckpointStore would never hold if at least on
    checkpoint became unreadable.
    
    This adds proper equals()/hashCode() to CompletedCheckpoint and extends
    the test to properly create new CompletedCheckpoints. Before, we were
    reusing the same CompletedCheckpoint instances, meaning that
    Objects.equals()/hashCode() would make the test succeed.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aljoscha/flink jira-8807-zookeeper-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5623.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5623
    
----
commit 777ddb57ee72d200d1312dc8e6dfdb52af6b9950
Author: Aljoscha Krettek <al...@...>
Date:   2018-03-02T16:46:56Z

    [FLINK-8807] Fix ZookeeperCompleted checkpoint store can get stuck in infinite loop
    
    Before, CompletedCheckpoint did not have proper equals()/hashCode(),
    which meant that the fixpoint condition in
    ZooKeeperCompletedCheckpointStore would never hold if at least on
    checkpoint became unreadable.
    
    This adds proper equals()/hashCode() to CompletedCheckpoint and extends
    the test to properly create new CompletedCheckpoints. Before, we were
    reusing the same CompletedCheckpoint instances, meaning that
    Objects.equals()/hashCode() would make the test succeed.

----


> ZookeeperCompleted checkpoint store can get stuck in infinite loop
> ------------------------------------------------------------------
>
>                 Key: FLINK-8807
>                 URL: https://issues.apache.org/jira/browse/FLINK-8807
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Aljoscha Krettek
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> This code: https://github.com/apache/flink/blob/9071e3befb8c279f73c3094c9f6bddc0e7cce9e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L201 can be stuck forever if at least one checkpoint is not readable because {{CompletedCheckpoint}} does not have a proper {{equals()}}/{{hashCode()}} anymore.
> We have to fix this and also add a unit test that verifies the loop still works if we make one snapshot unreadable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)