You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Hongchao Deng (JIRA)" <ji...@apache.org> on 2014/08/23 02:18:12 UTC
[jira] [Commented] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107742#comment-14107742 ] 

Hongchao Deng commented on ZOOKEEPER-1549:
------------------------------------------

[~fpj][~thawan][~fanster.z]
{quote}
The fix on the follower side is to simply not taking snapshot until the quorum switch to broadcast phase. The follower can have dirty snapshot in memory but as long as it doesn’t write to disk, we are ok and part of the issue is addressed.

On the leader side, the proposed patch is to change server startup and synchronization sequence. Uncommitted txn (any txn after the last snapshot) should never get applied to the data tree until synchronization phase is done. We use synchronization phase to catchup all follower and imply that all of the follower accepted the txn. Then, we apply these txns before starting broadcast phase.
{quote}

Is this issue fixed yet? Or does anyone have internal patch for this?
This issue seems kinda major thing for stability.

> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1549
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.3
>            Reporter: Jacky007
>            Assignee: Thawan Kooburat
>            Priority: Blocker
>             Fix For: 3.5.1
>
>         Attachments: ZOOKEEPER-1549-3.4.patch, ZOOKEEPER-1549-learner.patch, case.patch
>
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1.	Lets say there are three nodes in the ensemble A,B,C with A being the leader
> 2.	The current epoch is 7. 
> 3.	For simplicity of the example, lets say zxid is a two digit number, with epoch being the first digit.
> 4.	The zxid is 73
> 5.	All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there is a crash of the entire ensemble and B,C never write the change 74 to their log.
> Step 2
> A,B restart, A is elected as the new leader,  and A will load data and take a clean snapshot(change 74 is in it), then send diff to B, but B died before sync with A. A died later.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71, maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff. 
> Problem:
> The problem with the above sequence is that after truncate the log, A will load the snapshot again which is not correct.
> In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), the leader will send a snapshot to follower, it will not be a problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)