You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Thawan Kooburat (JIRA)" <ji...@apache.org> on 2014/07/02 03:34:25 UTC
[jira] [Commented] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049515#comment-14049515 ] 

Thawan Kooburat commented on ZOOKEEPER-1549:
--------------------------------------------

Here is the recap on the issue, for those who just found this JIRA

Problem:
When the leader started, it will treat every txn in its txnlog as committed and apply all of them into its in-memory data tree even though some of them was only acked by the leader (or the minority). 

If there is a follow that need to synchronize with the leader via snapshot.  The follower will get a snapshot with uncommitted txns in it and take dirty snapshot to disk. If there is a leader failure, it is possible that uncommitted txn is discarded in the next leader election round so this follower will have dirty snapshot on disk and there is no way it can recovered from this. 

The solution so far:
The fix on the follower side is to simply not taking snapshot until the quorum switch to broadcast phase. The follower can have dirty snapshot in memory but as long as it doesn’t write to disk, we are ok and part of the issue is addressed

On the leader side, the proposed patch is to change server startup and synchronization sequence.  Uncommitted txn (any txn after the last snapshot) should never get applied to the data tree until synchronization phase is done.  We use synchronization phase to catchup all follower and imply that all of the follower accepted the txn. Then, we apply these txns before starting broadcast phase.      
 
I will try to find someone on my team to help on this. 


> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1549
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.3
>            Reporter: Jacky007
>            Assignee: Thawan Kooburat
>            Priority: Blocker
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1549-3.4.patch, ZOOKEEPER-1549-learner.patch, case.patch
>
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1.	Lets say there are three nodes in the ensemble A,B,C with A being the leader
> 2.	The current epoch is 7. 
> 3.	For simplicity of the example, lets say zxid is a two digit number, with epoch being the first digit.
> 4.	The zxid is 73
> 5.	All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there is a crash of the entire ensemble and B,C never write the change 74 to their log.
> Step 2
> A,B restart, A is elected as the new leader,  and A will load data and take a clean snapshot(change 74 is in it), then send diff to B, but B died before sync with A. A died later.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71, maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff. 
> Problem:
> The problem with the above sequence is that after truncate the log, A will load the snapshot again which is not correct.
> In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), the leader will send a snapshot to follower, it will not be a problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)