You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "KangYin (JIRA)" <ji...@apache.org> on 2016/09/08 05:37:20 UTC

[jira] [Comment Edited] (ZOOKEEPER-2550) FollowerResyncConcurrencyTest failed in ZooKeeper 3.3.3

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15472847#comment-15472847 ] 

KangYin edited comment on ZOOKEEPER-2550 at 9/8/16 5:36 AM:
------------------------------------------------------------

Thanks Patrick.
I have checked ZOOKEEPER-1264. But the reason that the _FollowerResyncConcurrencyTest_ failed in this issue is not the same as ZOOKEEPER-1264. It appears both in 3.3.3 and 3.4+ .
As I mentioned in the issue's description, the reason happened is _FollowerResyncConcurrencyTest.java_ at line 92.

{code:title=FollowerResyncConcurrencyTest.java|borderStyle=solid}
        QuorumUtil qu = new QuorumUtil(1);
        qu.startAll();
        CountdownWatcher watcher1 = new CountdownWatcher();
        CountdownWatcher watcher2 = new CountdownWatcher();
        CountdownWatcher watcher3 = new CountdownWatcher();

        int index = 1;
        while(qu.getPeer(index).peer.leader == null)
            index++;

        Leader leader = qu.getPeer(index).peer.leader;

        assertNotNull(leader);
        /*
         * Reusing the index variable to select a follower to connect to
         */
        index = (index == 1) ? 2 : 1;
        qu.shutdown(index);
        final ZooKeeper zk3 = new DisconnectableZooKeeper("127.0.0.1:" + qu.getPeer(3).peer.getClientPort(), 1000,watcher3);
        watcher3.waitForConnected(CONNECTION_TIMEOUT);  // Failed here
        zk3.create("/mybar", null, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
{code}

Hope I can describe the trace clearly:
(1) initialize 3 quorum peers and start all (peer 3 is the leader in my case)
(2) shutdown peer 1
(3) new a DisconnectableZooKeeper connect to peer 3' clientport with watcher3
(4) watcher3 wait for connected, but failed after CONNECTION_TIMEOUT

I'm confusing about the connect failed. After checked the log messages, I got the following logs that probably related:

{noformat}
2016-09-05 13:56:55,000 - INFO [SyncThread:3:FileTxnLog@197] - Creating new log file: log.100000001
2016-09-05 13:56:55,000 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:11235:Follower@116] - Got zxid 0x100000001 expected 0x1
2016-09-05 13:56:55,000 - INFO [SyncThread:2:FileTxnLog@197] - Creating new log file: log.100000001
2016-09-05 13:56:55,078 - ERROR [CommitProcessor:3:CommitProcessor@146] - Unexpected exception causing CommitProcessor to exit
java.lang.AssertionError
at org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:66)
at org.apache.zookeeper.server.NIOServerCnxn.finishSessionInit(NIOServerCnxn.java:1552)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:183)
at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:540)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
2016-09-05 13:56:55,078 - INFO [CommitProcessor:3:CommitProcessor@148] - CommitProcessor exited loop!
{noformat}


was (Author: yinkang):
Thanks Patrick.
I have checked ZOOKEEPER-1264. But the reason that the __FollowerResyncConcurrencyTest__ failed in this issue is not the same as ZOOKEEPER-1264. It appears both in 3.3.3 and 3.4+ .
As I mentioned in the issue's description, the reason happened is __FollowerResyncConcurrencyTest.java__ at line 92.

{code:title=FollowerResyncConcurrencyTest.java|borderStyle=solid}

        QuorumUtil qu = new QuorumUtil(1);
        qu.startAll();
        CountdownWatcher watcher1 = new CountdownWatcher();
        CountdownWatcher watcher2 = new CountdownWatcher();
        CountdownWatcher watcher3 = new CountdownWatcher();

        int index = 1;
        while(qu.getPeer(index).peer.leader == null)
            index++;

        Leader leader = qu.getPeer(index).peer.leader;

        assertNotNull(leader);
        /*
         * Reusing the index variable to select a follower to connect to
         */
        index = (index == 1) ? 2 : 1;
        qu.shutdown(index);
        final ZooKeeper zk3 = new DisconnectableZooKeeper("127.0.0.1:" + qu.getPeer(3).peer.getClientPort(), 1000,watcher3);
        watcher3.waitForConnected(CONNECTION_TIMEOUT);  // Failed here
        zk3.create("/mybar", null, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);

{code}

Hope I can describe the trace clearly:
(1) initialize 3 quorum peers and start all (peer 3 is the leader in my case)
(2) shutdown peer 1
(3) new a DisconnectableZooKeeper connect to peer 3' clientport with watcher3
(4) watcher3 wait for connected, but failed after CONNECTION_TIMEOUT

I'm confusing about the connect failed. After checked the log messages, I got the following logs that probably related:

{noformat}
2016-09-05 13:56:55,000 - INFO [SyncThread:3:FileTxnLog@197] - Creating new log file: log.100000001
2016-09-05 13:56:55,000 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:11235:Follower@116] - Got zxid 0x100000001 expected 0x1
2016-09-05 13:56:55,000 - INFO [SyncThread:2:FileTxnLog@197] - Creating new log file: log.100000001
2016-09-05 13:56:55,078 - ERROR [CommitProcessor:3:CommitProcessor@146] - Unexpected exception causing CommitProcessor to exit
java.lang.AssertionError
at org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:66)
at org.apache.zookeeper.server.NIOServerCnxn.finishSessionInit(NIOServerCnxn.java:1552)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:183)
at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:540)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
2016-09-05 13:56:55,078 - INFO [CommitProcessor:3:CommitProcessor@148] - CommitProcessor exited loop!
{noformat}

> FollowerResyncConcurrencyTest failed in ZooKeeper 3.3.3
> -------------------------------------------------------
>
>                 Key: ZOOKEEPER-2550
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2550
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection, quorum, server, tests
>    Affects Versions: 3.3.3
>         Environment: Windows 10,
> Java 1.8.0,
> IDEA 2016.1.4,
> JUnit 4.8.1
>            Reporter: KangYin
>            Priority: Blocker
>              Labels: test
>
>  I'm studying on the Test of ZooKeeper 3.3.3 but got a test failure when I run  _testResyncBySnapThenDiffAfterFollowerCrashes_ in _FollowerResyncConcurrencyTest.java_.
> {quote}
> 2016-09-05 13:57:35,072 - INFO  [main:QuorumBase@307] - FINISHED testResyncBySnapThenDiffAfterFollowerCrashes
> java.util.concurrent.TimeoutException: Did not connect
> 	at org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:119)
> 	at org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:95)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at junit.framework.TestCase.runTest(TestCase.java:168)
> 	at junit.framework.TestCase.runBare(TestCase.java:134)
> 	at junit.framework.TestResult$1.protect(TestResult.java:110)
> 	at junit.framework.TestResult.runProtected(TestResult.java:128)
> 	at junit.framework.TestResult.run(TestResult.java:113)
> 	at junit.framework.TestCase.run(TestCase.java:124)
> 	at junit.framework.TestSuite.runTest(TestSuite.java:232)
> 	at junit.framework.TestSuite.run(TestSuite.java:227)
> 	at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> 	at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
> 	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:119)
> 	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
> 	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:234)
> 	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:74)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> {quote}
> Which happened in _FollowerResyncConcurrencyTest.java_ at line 92.
> {quote}
>         index = (index == 1) ? 2 : 1;
>         qu.shutdown(index);
>         final ZooKeeper zk3 = new DisconnectableZooKeeper("127.0.0.1:" + qu.getPeer(3).peer.getClientPort(), 1000,watcher3);
>         {color:red}watcher3.waitForConnected(CONNECTION_TIMEOUT);{color}
>         zk3.create("/mybar", null, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
> {quote}
> I checked the Log Message, and I guess it is probably because of the following ERROR (marked as blue):
> {quote}
> 2016-09-05 13:56:54,928 - INFO  [main-SendThread():ClientCnxn$SendThread@1041] - Opening socket connection to server /127.0.0.1:11237
> 2016-09-05 13:56:54,930 - INFO  [main-SendThread(127.0.0.1:11237):ClientCnxn$SendThread@949] - Socket connection established to 127.0.0.1/127.0.0.1:11237, initiating session
> 2016-09-05 13:56:54,930 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11237:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:33566
> 2016-09-05 13:56:54,957 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11237:NIOServerCnxn@777] - Client attempting to establish new session at /127.0.0.1:33566
>  {color:blue}
> 2016-09-05 13:56:55,000 - INFO  [SyncThread:3:FileTxnLog@197] - Creating new log file: log.100000001
> 2016-09-05 13:56:55,000 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:11235:Follower@116] - Got zxid 0x100000001 expected 0x1
> 2016-09-05 13:56:55,000 - INFO  [SyncThread:2:FileTxnLog@197] - Creating new log file: log.100000001
> 2016-09-05 13:56:55,078 - ERROR [CommitProcessor:3:CommitProcessor@146] - Unexpected exception causing CommitProcessor to exit
> java.lang.AssertionError
> 	at org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:66)
> 	at org.apache.zookeeper.server.NIOServerCnxn.finishSessionInit(NIOServerCnxn.java:1552)
> 	at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:183)
> 	at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:540)
> 	at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
> 2016-09-05 13:56:55,078 - INFO  [CommitProcessor:3:CommitProcessor@148] - CommitProcessor exited loop!
> {color}
> 2016-09-05 13:56:55,931 - INFO  [main-SendThread(127.0.0.1:11237):ClientCnxn$SendThread@1157] - Client session timed out, have not heard from server in 1001ms for sessionid 0x0, closing socket connection and attempting reconnect
> 2016-09-05 13:56:58,035 - INFO  [main-SendThread(127.0.0.1:11237):ClientCnxn$SendThread@1041] - Opening socket connection to server 127.0.0.1/127.0.0.1:11237
> 2016-09-05 13:56:58,036 - INFO  [main-SendThread(127.0.0.1:11237):ClientCnxn$SendThread@949] - Socket connection established to 127.0.0.1/127.0.0.1:11237, initiating session
> {quote}
> I'll very appreciate it if I can get some help from you genius people.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)