You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Michael K. Edwards (JIRA)" <ji...@apache.org> on 2018/11/26 06:47:00 UTC

[jira] [Commented] (ZOOKEEPER-3201) Flaky test: org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698526#comment-16698526 ] 

Michael K. Edwards commented on ZOOKEEPER-3201:
-----------------------------------------------

What seems to be happening here is that this "second chance" logic isn't sufficient to ensure that we don't hit {{ConnectionLossException}} again during the second attempt to test {{zk.exists()}}.

{noformat}
    /**
     * Ensure the client is able to talk to the server.
     * 
     * @param idx the idx of the server the client is talking to
     */
    private void checkClientConnected(int idx) throws Exception {
        ZooKeeper zk = getClient(idx);
        if (zk == null) {
            return;
        }
        try {
            Assert.assertNull(zk.exists("/foofoofoo-connected", false));
        } catch (ConnectionLossException e) {
            // second chance...
            // in some cases, leader change in particular, the timing is
            // very tricky to get right in order to assure that the client has
            // disconnected and reconnected. In some cases the client will
            // disconnect, then attempt to reconnect before the server is
            // back, in which case we'll see another connloss on the operation
            // in the try, this catches that case and waits for the server
            // to come back
            PeerStruct peer = qu.getPeer(idx);
            Assert.assertTrue("Waiting for server down", ClientBase.waitForServerUp(
                    "127.0.0.1:" + peer.clientPort, ClientBase.CONNECTION_TIMEOUT));

            Assert.assertNull(zk.exists("/foofoofoo-connected", false));
        }
    }
{noformat}

> Flaky test: org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart
> --------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3201
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3201
>             Project: ZooKeeper
>          Issue Type: Sub-task
>            Reporter: Michael K. Edwards
>            Priority: Major
>
> Encountered when running tests locally:
> {noformat}
> 64429     [junit] 2018-11-25 22:28:12,729 [myid:127.0.0.1:27389] - INFO  [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0      .1:27389. Will not attempt to authenticate using SASL (unknown error)
> 64430     [junit] 2018-11-25 22:28:12,730 [myid:127.0.0.1:27389] - INFO  [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli      ent: /127.0.0.1:47668, server: localhost/127.0.0.1:27389
> 64431     [junit] 2018-11-25 22:28:12,734 [myid:] - INFO  [NIOWorkerThread-1:Learner@117] - Revalidating client: 0x10000a9cccf0000
> 64432     [junit] 2018-11-25 22:28:12,743 [myid:127.0.0.1:27389] - INFO  [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1390] - Session establishment complete on server localhost/12      7.0.0.1:27389, sessionid = 0x10000a9cccf0000, negotiated timeout = 30000
> 64433     [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO  [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0      .1:27392. Will not attempt to authenticate using SASL (unknown error)
> 64434     [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO  [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli      ent: /127.0.0.1:52160, server: localhost/127.0.0.1:27392
> 64435     [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0      .1:27395. Will not attempt to authenticate using SASL (unknown error)
> 64436     [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli      ent: /127.0.0.1:47256, server: localhost/127.0.0.1:27395
> 64437     [junit] 2018-11-25 22:28:13,017 [myid:] - INFO  [NIOWorkerThread-4:ZooKeeperServer@1030] - Refusing session request for client /127.0.0.1:47256 as it has seen zxid 0x300000000 our       last zxid is 0x2fffffffe client must try another server
> 64438     [junit] 2018-11-25 22:28:13,018 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1236] - Unable to read additional data from server sessionid       0x30000a9ccd20000, likely server has closed socket, closing socket connection and attempting reconnect
> 64439     [junit] 2018-11-25 22:28:13,023 [myid:127.0.0.1:27392] - INFO  [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1390] - Session establishment complete on server localhost/12      7.0.0.1:27392, sessionid = 0x20000a9d0940000, negotiated timeout = 30000
> 64440     [junit] 2018-11-25 22:28:13,119 [myid:] - INFO  [main:FourLetterWordMain@87] - connecting to 127.0.0.1 27395
> 64441     [junit] 2018-11-25 22:28:13,120 [myid:] - INFO  [NIOWorkerThread-1:NIOServerCnxn@518] - Processing stat command from /127.0.0.1:47258
> 64442     [junit] 2018-11-25 22:28:13,121 [myid:] - INFO  [NIOWorkerThread-1:StatCommand@53] - Stat command output
> 64443     [junit] 2018-11-25 22:28:14,134 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0      .1:27395. Will not attempt to authenticate using SASL (unknown error)
> 64444     [junit] 2018-11-25 22:28:14,135 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli      ent: /127.0.0.1:47312, server: localhost/127.0.0.1:27395
> 64445     [junit] 2018-11-25 22:28:14,135 [myid:] - INFO  [NIOWorkerThread-2:ZooKeeperServer@1030] - Refusing session request for client /127.0.0.1:47312 as it has seen zxid 0x300000000 our       last zxid is 0x2fffffffe client must try another server
> 64446     [junit] 2018-11-25 22:28:14,137 [myid:127.0.0.1:27395] - INFO  [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1236] - Unable to read additional data from server sessionid       0x30000a9ccd20000, likely server has closed socket, closing socket connection and attempting reconnect
> 64447     [junit] 2018-11-25 22:28:14,240 [myid:] - INFO  [main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED testRolloverThenLeaderRestart
> 64448     [junit] org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /foofoofoo-connected
> 64449     [junit]     at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> 64450     [junit]     at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> 64451     [junit]     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1942)
> 64452     [junit]     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1970)
> 64453     [junit]     at org.apache.zookeeper.server.ZxidRolloverTest.checkClientConnected(ZxidRolloverTest.java:119)
> 64454     [junit]     at org.apache.zookeeper.server.ZxidRolloverTest.checkClientsConnected(ZxidRolloverTest.java:90)
> 64455     [junit]     at org.apache.zookeeper.server.ZxidRolloverTest.start(ZxidRolloverTest.java:165)
> 64456     [junit]     at org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart(ZxidRolloverTest.java:382){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)