You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Karolos Antoniadis (JIRA)" <ji...@apache.org> on 2019/07/31 01:24:00 UTC

[jira] [Created] (ZOOKEEPER-3479) Logging false leader election times

Karolos Antoniadis created ZOOKEEPER-3479:
---------------------------------------------

             Summary: Logging false leader election times
                 Key: ZOOKEEPER-3479
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3479
             Project: ZooKeeper
          Issue Type: Bug
          Components: leaderElection
    Affects Versions: 3.5.5
            Reporter: Karolos Antoniadis
         Attachments: server1.txt, server2.txt

There seems to be a problem with the logging of leader election times: the logged times are much smaller than the actual time it took for the leader election to complete.

This bug can be easily reproduced by following these steps:

1) Run a ZK cluster of 3 servers

2) Kill the server that is currently the leader

3) The log files of the remaining 2 servers contain false leader election times

 

In the attached files you can see the log files of the remaining 2 serve. For brevity, I removed the parts before and after the leader election from the log files.

For example, in {{server1.txt}} we can see that:

 
{code:java}
2019-07-31 00:57:31,852 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer@1318] - PeerState set to LOOKING
2019-07-31 00:57:31,853 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer@1193] - LOOKING
2019-07-31 00:57:31,853 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):FastLeaderElection@885] - New election. My id = 1, proposed zxid=0x100000001
[...]
2019-07-31 00:57:32,272 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):Follower@69] - FOLLOWING - LEADER ELECTION TOOK - 1 MS{code}
Leader election supposedly took only 1ms, but in reality it took (32,272 - 31,853) = 419ms!

The reason for this bug seems to be the introduction of this line
{code:java}
start_fle = Time.currentElapsedTime();{code}
(seen here [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1402]) 

back in this commit [https://github.com/apache/zookeeper/commit/5428cd4bc963c2e653a260c458a8a8edf3fa08ef].

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)