You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Michi Mutsuzaki (JIRA)" <ji...@apache.org> on 2014/03/14 05:02:42 UTC

[jira] [Commented] (ZOOKEEPER-1894) ObserverTest.testObserver fails consistently

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934542#comment-13934542 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1894:
--------------------------------------------

It looks like the observer is sending a lot of messages to itself during the leader election.

{noformat}
diff --git src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java
index 9876c3d..1e28209 100644
--- src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java
+++ src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java
@@ -248,6 +248,10 @@ public class FastLeaderElection implements Election {
                         long relectionEpoch = response.buffer.getLong();
                         long rpeerepoch;
                         
+                        LOG.info("Received a message sid={} state={} " +
+                                 "rleader={} rzxid={} relectionEpoch={}",
+                                 response.sid, rstate, rleader,
+                                 rzxid, relectionEpoch);
                         if(!backCompatibility28){
                            rpeerepoch = response.buffer.getLong();
                         } else {
{noformat}

{noformat}
    [junit] 2014-03-13 20:49:49,771 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@251] - Received a message sid=3 state=3 rleader=2 rzxid=0 relectionEpoch=1
    [junit] 2014-03-13 20:49:49,772 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@251] - Received a message sid=3 state=3 rleader=2 rzxid=0 relectionEpoch=1
    [junit] 2014-03-13 20:49:49,772 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@251] - Received a message sid=3 state=3 rleader=2 rzxid=0 relectionEpoch=1
    [junit] 2014-03-13 20:49:49,772 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@251] - Received a message sid=3 state=3 rleader=2 rzxid=0 relectionEpoch=1
    [junit] 2014-03-13 20:49:49,772 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@251] - Received a message sid=3 state=3 rleader=2 rzxid=0 relectionEpoch=1
    [junit] 2014-03-13 20:49:49,773 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@251] - Received a message sid=3 state=3 rleader=2 rzxid=0 relectionEpoch=1
    ...
{noformat}

> ObserverTest.testObserver fails consistently
> --------------------------------------------
>
>                 Key: ZOOKEEPER-1894
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1894
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.5.0
>         Environment: ubuntu 13.10
> Server environment:java.version=1.7.0_51
> Server environment:java.vendor=Oracle Corporation
>            Reporter: Michi Mutsuzaki
>             Fix For: 3.5.0
>
>         Attachments: TEST-org.apache.zookeeper.test.ObserverTest.txt.gz
>
>
> ObserverTest.testObserver fails consistently on my box. It looks like the observer (myid:3) calls QuorumPeer.getQuorumVerifier() in a tight loop, and the leader (myid:2) is not getting enough CPU time to synchronize with the follower and the observer. The test passes if I increase ClientBase.CONNECTION_TIMEOUT from 30 seconds to 120 seconds. I'll attach a log file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)