You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Marshall McMullen (JIRA)" <ji...@apache.org> on 2014/06/06 01:05:02 UTC

[jira] [Commented] (ZOOKEEPER-1934) Stale data received from sync'd ensemble peer

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019389#comment-14019389 ] 

Marshall McMullen commented on ZOOKEEPER-1934:
----------------------------------------------

Diffing the zktreeutil dumps of each server is also interesting. There are a few minor differences with local sessions:

diff -a node1.zktree node3.zktree 
8933,8934d8932
<         |--[144115323715452941]
<         |   
9162,9163d9159
<         |   
<         |--[72058779056865292]

diff -a node1.zktree node4.zktree 
8933,8934d8932
<         |--[144115323715452941]
<         |   
9005,9006d9002
<         |--[216173168961912851]
<         |   
9162,9163d9157
<         |   
<         |--[72058779056865292]

diff -a node1.zktree node5.zktree 
8933,8934d8932
<         |--[144115323715452941]
<         |   
9005,9006d9002
<         |--[216173168961912851]
<         |   
9065,9066d9060
<         |--[288230547757793293]
<         |   
9162,9163d9155
<         |   
<         |--[72058779056865292]

Whereas node2 is MASSIVELY different.

In particular, the /binchanges value is different:

|--[binchanges]                                                 |--[binchanges]
|   |                                                           |   |   
|   |--[version => 3]                                         | |   |--[version => 2]




> Stale data received from sync'd ensemble peer
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1934
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1934
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Marshall McMullen
>         Attachments: node1.log, node2.log, node3.log, node4.log, node5.log
>
>
> In our regression testing we encountered an error wherein we were caching a value we read from zookeeper and then experienced session loss. We subsequently got reconnected to a different zookeeper server. When we tried to read the same path from this new zookeeper server we are getting a stale value.
> Specifically, we are reading "/binchanges" and originally got back a version of "4" from the first server. After we lost connection and reconnected before the session timeout, we then read "/binchanges" from the new server and got back a value of "3". 
> It's my understanding of the single system image guarantee that this should never be allowed. I realize that the single system image guarantee is still quorum based and it's certainly possible that a minority of the ensemble may have stale data. However, I also believe that each client has to send the highest zxid it's seen as part of its connection request to the server. And if the server it's connecting to has a smaller zxid than the value the client sends, then the connection request should be refused.
> Assuming I have all of that correct, then I'm at a loss for how this happened. 
> The failure happened around Jun  4 08:13:44. Just before that, at June  4 08:13:30 there was a round of leader election. During that round of leader election we voted server with id=4 and zxid=0x300001c4c. This then led to a new zxid=0x400000001. The new leader sends a diff to all the servers including the one we will soon read the stale data from (id=2). Server with ID=2's log files also reflect that as of 08:13:43 it was up to date and current with an UPTODATE message.
> I'm going to attach log files from all 5 ensemble nodes. I also used zktreeutil to dump the database out for the 5 ensemble nodes. I diff'd those, and compared them all for correctness. 1 of the nodes (id=2) has a massively divergent zktreeutil dump than the other 4 nodes even though it received the diff from the new leader.
> In the attachments there are 5 nodes. I will number each log file by it's zookeeper id, e.g. node4_zookeeper.log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)