You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "maoling (Jira)" <ji...@apache.org> on 2020/09/21 07:35:00 UTC

[jira] [Issue Comment Deleted] (ZOOKEEPER-2832) Data Inconsistency occurs if follower has uncommitted transaction in the log while synchronizing with the leader that has the lower last processed zxid

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

maoling updated ZOOKEEPER-2832:
-------------------------------
    Comment: was deleted

(was: [~benkim] [~fniksic]

Apply this test patch in the master branch, still failed!
{code:java}
[INFO] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 30.485 s <<< FAILURE! - in org.apache.zookeeper.server.quorum.QuorumPeerMainTest
[ERROR] testDivergenceResync  Time elapsed: 30.403 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: 0 ==> expected: <Expecting the value of the 1st key on 1st and 2nd servers should be same> but was: <0>
        at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testDivergenceResync(QuorumPeerMainTest.java:2042)[INFO] 

{code}
I will dig into it to find out: is it an issue from the test patch or a stubborn bug in the current zk codebase?)

> Data Inconsistency occurs if follower has uncommitted transaction in the log while synchronizing with the leader that has the lower last processed zxid
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2832
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2832
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.9
>            Reporter: Beom Heyn Kim
>            Priority: Major
>             Fix For: 3.4.10
>
>         Attachments: zookeeper-2832.patch
>
>
> Synchronization code may fail to truncate an uncommitted transaction in the follower’s transaction log. Here is a scenario:
>  
> Initial condition:
> Start the ensemble with three nodes A, B and C with C being the leader
> The current epoch is 1
> For simplicity of the example, let’s say zxid is a two digit number, with epoch being the first digit
> Create two znodes ‘key0’ and ‘key1’ whose value is ‘0’ and ‘1’, respectively
> The zxid is 12 -- 11 for creating key0 and 12 for creating key1. (For simplicity of the example, the zxid gets increased only by transactions directly changing the data of znodes.)
> All the nodes have seen the change 12 and have persistently logged it
> Shut down all
>  
> Step 1
> Start Node A and B. Epoch becomes 2. Then, a request, setData(key0, 1000), with zxid 21 is issued. The leader B writes it to the log but Node A is shutdown before writing it to the log. Then, the leader B is also shut down. The change 21 is applied only to B but not to A or C.
>  
> Step 2
> Start Node A and C. Epoch becomes 3. Node A has the higher zxid than Node C (i.e. 20 > 12). So, Node A becomes the leader. Yet, the last processed zxid is 12 for both Node A and C. So, they are in sync already. Node A sends an empty DIFF to Node C. Node C takes a snapshot and creates snapshot.12. Then, A and C are shut down. Now, C has the higher zxid than Node B.
>  
> Step 3
> Start Node B and C. Epoch becomes 4. Node C has the higher zxid than Node B (i.e. 30 > 21). So, Node C becomes the leader. Node B and C has the different last processed zxid (i.e. 21 vs 12), and the LinkedList object ‘proposals’ is empty. Thus, Node C sends SNAP to Node B. Node B takes a clean snapshot and creates snapshot.12 as the zxid 12 is the last processed zxid of the leader C. (Note the newly created snapshot on B is assigned the lower zxid then the change 21 in the log). Then, the request, setData(key1, 1001), with zxid 41 is issued. Both B and C apply the change 41 into their logs. (Note that now B and C have the same last processed zxid) Then, B and C are shut down.
>  
> Step 4
> Start Node B and C. Epoch becomes 5.  Node B and C use their local log and snapshot files to restore their in-memory data tree. Node B has 1000 as the value of key0, because it’s latest valid snapshot is snapshot.12 and there was a later transaction with zxid 21 in its log. Yet, Node C has 0 as the value of key0, because the change 21 was never written on C. Node C is the leader. Node B and C have the same last processed zxid, i.e. 41. So, they are considered to be in sync already, and Node C sends an empty DIFF to Node B. So, the synchronization completes with the initially restored in-memory data tree on B and C.
>  
> Problem
> The value of key0 on B is 1000, while the value of the key0 on Node C is 0. The LearnerHandler.run on C at Step 3, 	never sends TRUNC but just SNAP. So, the change 21 was never truncated on B. Also, at step 4, since B uses the snapshot of the lower zxid to restore its in-memory data tree, the change 21 could get into the data tree. Then, the leader C at the step 4 did not send SNAP, because the change 41 made to both B and C makes the leader C think the B and C are already in sync. Thus, data inconsistency occurs.
>  
> The attached test case can deterministically reproduce the bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)