You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Flavio Junqueira (JIRA)" <ji...@apache.org> on 2013/05/08 09:55:15 UTC

[jira] [Commented] (ZOOKEEPER-1700) FLETest consistently failing - setLastSeenQuorumVerifier seems to be hanging

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651689#comment-13651689 ] 

Flavio Junqueira commented on ZOOKEEPER-1700:
---------------------------------------------

Pat, I had it running in a loop and it eventually failed, but it was not testJoin, it was testTripleElection. I've observed the same warning for setLastSeenQuorumVerifier in testJoin, but again it didn't cause any harm.

The part that I've seen in the logs that called my attention was this one:

{noformat}
2013-05-08 09:38:50,827 [myid:] - INFO  [Thread-37:FLETest$LEThread@144] - Finished election: 3, 4
2013-05-08 09:38:50,827 [myid:] - INFO  [Thread-39:FLETest$LEThread@144] - Finished election: 4, 4
2013-05-08 09:38:50,827 [myid:] - INFO  [Thread-39:FLETest$LEThread@158] - I'm the leader: 4
2013-05-08 09:38:50,828 [myid:] - INFO  [Thread-33:FLETest$LEThread@144] - Finished election: 1, 4
2013-05-08 09:38:50,828 [myid:] - INFO  [Thread-35:FLETest$LEThread@144] - Finished election: 2, 4
2013-05-08 09:38:50,829 [myid:] - ERROR [Thread-35:NIOServerCnxnFactory$1@92] - Thread Thread[Thread-35,5,main] died
java.lang.IllegalMonitorStateException
        at java.lang.Object.notify(Native Method)
        at org.apache.zookeeper.test.FLETest$LEThread.run(FLETest.java:205)
2013-05-08 09:38:50,829 [myid:] - INFO  [Thread-31:FLETest$LEThread@144] - Finished election: 0, 4
2013-05-08 09:38:50,829 [myid:] - ERROR [Thread-31:NIOServerCnxnFactory$1@92] - Thread Thread[Thread-31,5,main] died
java.lang.IllegalMonitorStateException
        at java.lang.Object.notify(Native Method)
        at org.apache.zookeeper.test.FLETest$LEThread.run(FLETest.java:205)
2013-05-08 09:38:50,928 [myid:] - ERROR [Thread-37:NIOServerCnxnFactory$1@92] - Thread Thread[Thread-37,5,main] died
java.lang.IllegalMonitorStateException
        at java.lang.Object.notify(Native Method)
        at org.apache.zookeeper.test.FLETest$LEThread.run(FLETest.java:205)
{noformat}

I think the threads are not joining because they are dying. 
                
> FLETest consistently failing - setLastSeenQuorumVerifier seems to be hanging
> ----------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1700
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1700
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.5.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.5.0
>
>
> I'm consistently seeing a failure on my laptop when running the FLETest "testJoin" test. What seems to be happening is that the call to setLastSeenQuorumVerifier is hanging.
> See the following log from the test, notice 17:35:57 for the period in question. Note that I turned on debug logging and added a few log messages around the call to setLastSeenQuorumVerifier (you can see the code enter but never leave)
> Note: I've applied ZOOKEEPER-1324 to trunk code and then run this test but that doesn't seem to help. Also note that this test is passing consistently when run against branch-3.4.
> {noformat}
> 2013-05-07 17:35:57,859 [myid:] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Follower@65] - FOLLOWING - LEADER ELECTION TOOK - 16
> 2013-05-07 17:35:57,859 [myid:] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@436] - LEADING - LEADER ELECTION TOOK - 17
> 2013-05-07 17:35:57,863 [myid:] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:FileTxnSnapLog@270] - Snapshotting: 0x0 to /home/phunt/dev/zookeeper-trunk/build/test/tmp/test3690487600947307322.junit.dir/version-2/snapshot.0
> 2013-05-07 17:35:57,873 [myid:] - INFO  [LearnerHandler-/127.0.0.1:34262:LearnerHandler@269] - Follower sid: 0 : info : 0.0.0.0:11222:11223:participant;0.0.0.0:11221
> 2013-05-07 17:35:57,878 [myid:] - INFO  [LearnerHandler-/127.0.0.1:34262:LearnerHandler@328] - Synchronizing with Follower sid: 0 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x0
> 2013-05-07 17:35:57,878 [myid:] - DEBUG [LearnerHandler-/127.0.0.1:34262:LearnerHandler@395] - committedLog is empty but leader and follower are in sync, zxid=0x0
> 2013-05-07 17:35:57,878 [myid:] - INFO  [LearnerHandler-/127.0.0.1:34262:LearnerHandler@404] - Sending DIFF
> 2013-05-07 17:35:57,879 [myid:] - DEBUG [LearnerHandler-/127.0.0.1:34262:LearnerHandler@411] - Sending NEWLEADER message to 0
> 2013-05-07 17:35:57,880 [myid:] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@331] - Getting a diff from the leader 0x0
> 2013-05-07 17:35:57,885 [myid:] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@457] - Learner received NEWLEADER message
> 2013-05-07 17:35:57,885 [myid:] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@460] - NEWLEADER calling configfromstring
> 2013-05-07 17:35:57,885 [myid:] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@462] - NEWLEADER setting quorum verifier
> 2013-05-07 17:35:57,886 [myid:] - WARN  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:QuorumPeer@1218] - setLastSeenQuorumVerifier called with stale config 0. Current version: 0
> 2013-05-07 17:36:01,880 [myid:] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@585] - Shutting down
> 2013-05-07 17:36:01,881 [myid:] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@591] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Waiting for a quorum of followers, only synced with sids: [ [1] ]
> 	at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:591)
> 	at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:487)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:949)
> 2013-05-07 17:36:01,881 [myid:] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:ZooKeeperServer@398] - shutting down
> 2013-05-07 17:36:01,881 [myid:] - INFO  [LearnerCnxAcceptor-0.0.0.0/0.0.0.0:11225:Leader$LearnerCnxAcceptor@398] - exception while shutting down acceptor: java.net.SocketException: Socket closed
> 2013-05-07 17:36:01,882 [myid:] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer@979] - PeerState set to LOOKING
> 2013-05-07 17:36:01,882 [myid:] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer@863] - LOOKING
> 2013-05-07 17:36:01,883 [myid:] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer@792] - Initializing leader election protocol...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira