You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "anaud (Jira)" <ji...@apache.org> on 2020/10/15 15:25:00 UTC

[jira] [Created] (ZOOKEEPER-3972) Convergence fail when a follower tries to resync with a leader having incomplete commitlog

anaud created ZOOKEEPER-3972:
--------------------------------

             Summary: Convergence fail when a follower tries to resync with a leader having incomplete commitlog
                 Key: ZOOKEEPER-3972
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3972
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.5.8
            Reporter: anaud
         Attachments: zookeeper-testResyncWithLeaderHavingIncompleteCommitlog.patch

It is possible that a leader may have incomplete commitlog because it resync'ed with the old leader via SNAPSHOT replication.

Then, a follower may try to resync with the leader, but because there may be some transactions the follower missed earlier and the leader does not have in its commitlog.

They decided to use txnlog + commitlog to resync. However, this will lead to convergence failure because the leader does not send the missing transactions that are not in its commitlog.

Here is the abstract step to reproduce the bug, and I attached the patch with the test case that can reproduce the bug.

Initially, node A,B,C are all sync'ed.
1. Node A crashes; setData 0x11 on B and C
2. Node B and C crash
3. Node A and B restart
4. Node A crashes; setData 0x21 on B
5. Node B crashes
6. Node B and C restart
7. Node C crashes; setData 0x32 on B
8. Node A and C restart
9. Node B restarts


At step 6, C is a follower getting a snapshot from B, and C does not have the transaction 0x21 in its commitlog (only in the snapshot).

At step 8, C is the leader which does not have 0x21 in its commitlog, which A never gets.

In the end, 0x21 only exists on B and C, but not on A.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)