You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Travis Crawford (JIRA)" <ji...@apache.org> on 2010/07/15 05:23:51 UTC

[jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888684#action_12888684 ] 

Travis Crawford commented on ZOOKEEPER-335:
-------------------------------------------

Unfortunately I still observed the "Leader epoch" issue and needed to manually force a leader election for the cluster to recover. This test was performed with the following base+patches, applied in the order listed.

Zookeeper 3.3.1
ZOOKEEPER-744
ZOOKEEPER-790


{code}
2010-07-15 02:43:57,181 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileSnap@82] - Reading snapshot /data/zookeeper/version-2/snapshot.2300001ac2
2010-07-15 02:43:57,384 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@649] - New election. My id =  1, Proposed zxid = 154618826848
2010-07-15 02:43:57,385 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@689] - Notification: 1, 154618826848, 4, 1, LOOKING, LOOKING, 1
2010-07-15 02:43:57,385 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@799] - Notification: 2, 146030952153, 3, 1, LOOKING, LEADING, 2
2010-07-15 02:43:57,385 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@799] - Notification: 2, 146030952153, 3, 1, LOOKING, FOLLOWING, 3
2010-07-15 02:43:57,385 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@642] - FOLLOWING
2010-07-15 02:43:57,385 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@151] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /data/zookeeper/txlog/version-2 snapdir /data/zookeeper/version-2
2010-07-15 02:43:57,387 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@71] - Leader epoch 23 is less than our epoch 24
2010-07-15 02:43:57,387 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@82] - Exception when following the leader 
java.io.IOException: Error: Epoch of leader is lower
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
2010-07-15 02:43:57,387 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called 
java.lang.Exception: shutdown Follower
        at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:648)
{code}


I followed the recipe @vishal provided for recreating.

(a) Stop one follower in a three node cluster
(b) Get some tea while it falls behind
(c) Start the node stopped in (a).


These timestamps show where the follower was stopped. It also shows when it was turned back on.

{code}
2010-07-15 02:35:36,398 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1661] - Established session 0x229aa13cfc6276b with negotiated timeout 10000 for client /10.209.45.114:34562
2010-07-15 02:39:18,907 - INFO  [main:QuorumPeerConfig@90] - Reading configuration from: /etc/zookeeper/conf/zoo.cfg
{code}


This timestamp is the first ``Leader epoch`` line. Everything between these two points will be the interesting bits.

{code}
2010-07-15 02:39:43,339 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@71] - Leader epoch 23 is less than our epoch 24
{code}

> zookeeper servers should commit the new leader txn to their logs.
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-335
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.1.0
>            Reporter: Mahadev konar
>            Assignee: Mahadev konar
>            Priority: Blocker
>             Fix For: 3.4.0
>
>         Attachments: faultynode-vishal.txt, zk.log.gz, zklogs.tar.gz
>
>
> currently the zookeeper followers do not commit the new leader election. This will cause problems in a failure scenarios with a follower acking to the same leader txn id twice, which might be two different intermittent leaders and allowing them to propose two different txn's of the same zxid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.