You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Fangmin Lv (JIRA)" <ji...@apache.org> on 2017/07/18 19:35:00 UTC

[jira] [Created] (ZOOKEEPER-2846) Leader follower sync with on disk txns can possibly leads to data inconsistency

Fangmin Lv created ZOOKEEPER-2846:
-------------------------------------

             Summary: Leader follower sync with on disk txns can possibly leads to data inconsistency
                 Key: ZOOKEEPER-2846
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2846
             Project: ZooKeeper
          Issue Type: Bug
          Components: quorum
    Affects Versions: 3.5.3, 3.4.10, 3.6.0
            Reporter: Fangmin Lv
            Priority: Critical


On disk txn sync could cause data inconsistency if the current leader just had a snap sync before it became leader, and then having diff sync with its followers may synced the txns gap on disk. Here is scenario: 

Let's say S0 - S3 are followers, and S4 is leader at the beginning:

1. Stop S2 and send one more request
2. Stop S3 and send more requests to the quorum to let S3 have a snap sync with S4 when it started up
3. Stop S4 and S3 became the new leader
4. Start S2 and had a diff sync with S3, now there are gaps in S2

Attached the test case to verify the issue. Currently, there is no efficient way to check the gap in txn files is a real gap or due to Epoch change. We need to add that support, but before that, it would be safer to disable the on disk txn leader-follower sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)