You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by afine <gi...@git.apache.org> on 2017/12/13 01:11:32 UTC

[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...

GitHub user afine opened a pull request:

    https://github.com/apache/zookeeper/pull/432

    [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeaderEstablishment

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/afine/zookeeper ZOOKEEPER-2953

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #432
    
----
commit 68a0382093da1d64583211746ac672ed12f5da5c
Author: Abraham Fine <af...@apache.org>
Date:   2017-12-13T00:35:50Z

    ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeaderEstablishment

----


---

[GitHub] zookeeper pull request #432: ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLead...

Posted by phunt <gi...@git.apache.org>.
Github user phunt commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/432#discussion_r157326074
  
    --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
    @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception {
                     output[0], 2);
         }
     
    +    /**
    +     * This test validates that if a quorum member determines that it is leader without the support of the rest of the
    +     * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower.
    +     *
    +     * @throws IOException
    +     * @throws InterruptedException
    +     */
    +    @Test
    +    public void testElectionFraud() throws IOException, InterruptedException {
    +        // capture QuorumPeer logging
    +        Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout();
    +        ByteArrayOutputStream os = new ByteArrayOutputStream();
    +        WriterAppender appender = new WriterAppender(layout, os);
    +        appender.setThreshold(Level.INFO);
    +        Logger qlogger = Logger.getLogger(QuorumPeer.class);
    +        qlogger.addAppender(appender);
    +
    +        numServers = 3;
    +
    +        // used for assertions later
    +        boolean foundLeading = false;
    +        boolean foundLooking = false;
    +        boolean foundFollowing = false;
    +
    +        try {
    +          // spin up a quorum, we use a small ticktime to make the test run faster
    +          servers = LaunchServers(numServers, 500);
    --- End diff --
    
    Note that by reducing this you are also affecting the init and sync limits  in the same proportion... Not a reason not to do it but FYI in case we start seeing this test as flakey down the road. :-)


---

[GitHub] zookeeper issue #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeade...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/432
  
    @afine got it


---

[GitHub] zookeeper pull request #432: ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLead...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zookeeper/pull/432


---

[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...

Posted by afine <gi...@git.apache.org>.
Github user afine closed the pull request at:

    https://github.com/apache/zookeeper/pull/432


---

[GitHub] zookeeper issue #432: ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeaderEstab...

Posted by phunt <gi...@git.apache.org>.
Github user phunt commented on the issue:

    https://github.com/apache/zookeeper/pull/432
  
    lgtm, +1 thanks @afine 


---

[GitHub] zookeeper issue #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeade...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on the issue:

    https://github.com/apache/zookeeper/pull/432
  
    @afine Generally the new test looks good to me, but would you please elaborate why the old test was flaky and how the new fixes that?


---

[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/432#discussion_r156660196
  
    --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
    @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception {
                     output[0], 2);
         }
     
    +    /**
    +     * This test validates that if a quorum member determines that it is leader without the support of the rest of the
    +     * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower.
    +     *
    +     * @throws IOException
    +     * @throws InterruptedException
    +     */
    +    @Test
    +    public void testElectionFraud() throws IOException, InterruptedException {
    +        // capture QuorumPeer logging
    +        Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout();
    +        ByteArrayOutputStream os = new ByteArrayOutputStream();
    +        WriterAppender appender = new WriterAppender(layout, os);
    +        appender.setThreshold(Level.INFO);
    +        Logger qlogger = Logger.getLogger(QuorumPeer.class);
    +        qlogger.addAppender(appender);
    +
    +        int numServers = 3;
    +
    +        // used for assertions later
    +        boolean foundLeading = false;
    +        boolean foundLooking = false;
    +        boolean foundFollowing = false;
    +
    +        try {
    +          // spin up a quorum, we use a small ticktime to make the test run faster
    +          Servers servers = LaunchServers(numServers, 500);
    +
    +          // find the leader
    +          int trueLeader = -1;
    +          for (int i = 0; i < numServers; i++) {
    +            if (servers.mt[i].main.quorumPeer.leader != null) {
    +              trueLeader = i;
    +            }
    +          }
    +          Assert.assertTrue("There should be a leader", trueLeader >= 0);
    +
    +          // find a follower
    +          int falseLeader = (trueLeader + 1) % numServers;
    +          Assert.assertTrue(servers.mt[falseLeader].main.quorumPeer.follower != null);
    +
    +          // to keep the quorum peer running and force it to go into the looking state, we kill leader election
    +          // and close the connection to the leader
    +          servers.mt[falseLeader].main.quorumPeer.electionAlg.shutdown();
    +          servers.mt[falseLeader].main.quorumPeer.follower.getSocket().close();
    +
    +          // wait for the falseLeader to disconnect
    +          waitForOne(servers.zk[falseLeader], States.CONNECTING);
    +
    +          // convince falseLeader that it is the leader
    +          servers.mt[falseLeader].main.quorumPeer.setPeerState(QuorumPeer.ServerState.LEADING);
    +
    +          // provide time for the falseleader to realize no followers have connected
    +          // (this is twice the timeout used in Leader#getEpochToPropose)
    +          Thread.sleep(2 * servers.mt[falseLeader].main.quorumPeer.initLimit * servers.mt[falseLeader].main.quorumPeer.tickTime);
    +
    +          // Restart leader election
    +          servers.mt[falseLeader].main.quorumPeer.startLeaderElection();
    --- End diff --
    
    I wonder if this one has to be called explicitly. 
    
    The peer should automatically realise that it's no longer the leader, stop leading and start leader election (which is basically looking). That's what this test is intended to validate and shouldn't be started explicitly.
    
    Correct me if I'm wrong.


---

[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...

Posted by afine <gi...@git.apache.org>.
Github user afine commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/432#discussion_r156781040
  
    --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
    @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception {
                     output[0], 2);
         }
     
    +    /**
    +     * This test validates that if a quorum member determines that it is leader without the support of the rest of the
    +     * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower.
    +     *
    +     * @throws IOException
    +     * @throws InterruptedException
    +     */
    +    @Test
    +    public void testElectionFraud() throws IOException, InterruptedException {
    +        // capture QuorumPeer logging
    +        Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout();
    +        ByteArrayOutputStream os = new ByteArrayOutputStream();
    +        WriterAppender appender = new WriterAppender(layout, os);
    +        appender.setThreshold(Level.INFO);
    +        Logger qlogger = Logger.getLogger(QuorumPeer.class);
    +        qlogger.addAppender(appender);
    +
    +        int numServers = 3;
    +
    +        // used for assertions later
    +        boolean foundLeading = false;
    +        boolean foundLooking = false;
    +        boolean foundFollowing = false;
    +
    +        try {
    +          // spin up a quorum, we use a small ticktime to make the test run faster
    +          Servers servers = LaunchServers(numServers, 500);
    +
    +          // find the leader
    +          int trueLeader = -1;
    +          for (int i = 0; i < numServers; i++) {
    +            if (servers.mt[i].main.quorumPeer.leader != null) {
    +              trueLeader = i;
    +            }
    +          }
    +          Assert.assertTrue("There should be a leader", trueLeader >= 0);
    +
    +          // find a follower
    +          int falseLeader = (trueLeader + 1) % numServers;
    +          Assert.assertTrue(servers.mt[falseLeader].main.quorumPeer.follower != null);
    +
    +          // to keep the quorum peer running and force it to go into the looking state, we kill leader election
    +          // and close the connection to the leader
    +          servers.mt[falseLeader].main.quorumPeer.electionAlg.shutdown();
    +          servers.mt[falseLeader].main.quorumPeer.follower.getSocket().close();
    +
    +          // wait for the falseLeader to disconnect
    +          waitForOne(servers.zk[falseLeader], States.CONNECTING);
    +
    +          // convince falseLeader that it is the leader
    +          servers.mt[falseLeader].main.quorumPeer.setPeerState(QuorumPeer.ServerState.LEADING);
    +
    +          // provide time for the falseleader to realize no followers have connected
    +          // (this is twice the timeout used in Leader#getEpochToPropose)
    +          Thread.sleep(2 * servers.mt[falseLeader].main.quorumPeer.initLimit * servers.mt[falseLeader].main.quorumPeer.tickTime);
    +
    +          // Restart leader election
    +          servers.mt[falseLeader].main.quorumPeer.startLeaderElection();
    --- End diff --
    
    Stopping and starting leader election is necessary here to prevent a race condition. It is possible that after the server is disconnected from the leader it becomes a follower before the test hits `servers.mt[falseLeader].main.quorumPeer.setPeerState(QuorumPeer.ServerState.LEADING);` and falseLeader will never try to `lead`, defeating the purpose of the test.


---

[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/432#discussion_r156793172
  
    --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
    @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception {
                     output[0], 2);
         }
     
    +    /**
    +     * This test validates that if a quorum member determines that it is leader without the support of the rest of the
    +     * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower.
    +     *
    +     * @throws IOException
    +     * @throws InterruptedException
    +     */
    +    @Test
    +    public void testElectionFraud() throws IOException, InterruptedException {
    +        // capture QuorumPeer logging
    +        Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout();
    +        ByteArrayOutputStream os = new ByteArrayOutputStream();
    +        WriterAppender appender = new WriterAppender(layout, os);
    +        appender.setThreshold(Level.INFO);
    +        Logger qlogger = Logger.getLogger(QuorumPeer.class);
    +        qlogger.addAppender(appender);
    +
    +        int numServers = 3;
    +
    +        // used for assertions later
    +        boolean foundLeading = false;
    +        boolean foundLooking = false;
    +        boolean foundFollowing = false;
    +
    +        try {
    +          // spin up a quorum, we use a small ticktime to make the test run faster
    +          Servers servers = LaunchServers(numServers, 500);
    +
    +          // find the leader
    +          int trueLeader = -1;
    +          for (int i = 0; i < numServers; i++) {
    +            if (servers.mt[i].main.quorumPeer.leader != null) {
    +              trueLeader = i;
    +            }
    +          }
    +          Assert.assertTrue("There should be a leader", trueLeader >= 0);
    +
    +          // find a follower
    +          int falseLeader = (trueLeader + 1) % numServers;
    +          Assert.assertTrue(servers.mt[falseLeader].main.quorumPeer.follower != null);
    +
    +          // to keep the quorum peer running and force it to go into the looking state, we kill leader election
    +          // and close the connection to the leader
    +          servers.mt[falseLeader].main.quorumPeer.electionAlg.shutdown();
    +          servers.mt[falseLeader].main.quorumPeer.follower.getSocket().close();
    +
    +          // wait for the falseLeader to disconnect
    +          waitForOne(servers.zk[falseLeader], States.CONNECTING);
    +
    +          // convince falseLeader that it is the leader
    +          servers.mt[falseLeader].main.quorumPeer.setPeerState(QuorumPeer.ServerState.LEADING);
    +
    +          // provide time for the falseleader to realize no followers have connected
    +          // (this is twice the timeout used in Leader#getEpochToPropose)
    +          Thread.sleep(2 * servers.mt[falseLeader].main.quorumPeer.initLimit * servers.mt[falseLeader].main.quorumPeer.tickTime);
    +
    +          // Restart leader election
    +          servers.mt[falseLeader].main.quorumPeer.startLeaderElection();
    --- End diff --
    
    I see. Hence the shutdown() call a few lines before.
    Makes sense.


---

[GitHub] zookeeper pull request #432: ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLead...

Posted by afine <gi...@git.apache.org>.
Github user afine commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/432#discussion_r156999341
  
    --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
    @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception {
                     output[0], 2);
         }
     
    +    /**
    +     * This test validates that if a quorum member determines that it is leader without the support of the rest of the
    +     * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower.
    +     *
    +     * @throws IOException
    +     * @throws InterruptedException
    +     */
    +    @Test
    +    public void testElectionFraud() throws IOException, InterruptedException {
    +        // capture QuorumPeer logging
    +        Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout();
    +        ByteArrayOutputStream os = new ByteArrayOutputStream();
    +        WriterAppender appender = new WriterAppender(layout, os);
    +        appender.setThreshold(Level.INFO);
    +        Logger qlogger = Logger.getLogger(QuorumPeer.class);
    +        qlogger.addAppender(appender);
    +
    +        int numServers = 3;
    +
    +        // used for assertions later
    +        boolean foundLeading = false;
    +        boolean foundLooking = false;
    +        boolean foundFollowing = false;
    +
    +        try {
    +          // spin up a quorum, we use a small ticktime to make the test run faster
    +          Servers servers = LaunchServers(numServers, 500);
    +
    +          // find the leader
    +          int trueLeader = -1;
    +          for (int i = 0; i < numServers; i++) {
    +            if (servers.mt[i].main.quorumPeer.leader != null) {
    +              trueLeader = i;
    +            }
    +          }
    +          Assert.assertTrue("There should be a leader", trueLeader >= 0);
    +
    +          // find a follower
    +          int falseLeader = (trueLeader + 1) % numServers;
    +          Assert.assertTrue(servers.mt[falseLeader].main.quorumPeer.follower != null);
    --- End diff --
    
    fixed


---

[GitHub] zookeeper issue #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeade...

Posted by afine <gi...@git.apache.org>.
Github user afine commented on the issue:

    https://github.com/apache/zookeeper/pull/432
  
    @anmolnar An explanation is coming, please excuse the cruft at this stage. I wanted to run the new test on apache hardware hence the [WIP] in the title.


---

[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...

Posted by afine <gi...@git.apache.org>.
GitHub user afine reopened a pull request:

    https://github.com/apache/zookeeper/pull/432

    [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeaderEstablishment

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/afine/zookeeper ZOOKEEPER-2953

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #432
    
----
commit 68a0382093da1d64583211746ac672ed12f5da5c
Author: Abraham Fine <af...@apache.org>
Date:   2017-12-13T00:35:50Z

    ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeaderEstablishment

----


---

[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...

Posted by afine <gi...@git.apache.org>.
Github user afine commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/432#discussion_r156780239
  
    --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
    @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception {
                     output[0], 2);
         }
     
    +    /**
    +     * This test validates that if a quorum member determines that it is leader without the support of the rest of the
    +     * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower.
    +     *
    +     * @throws IOException
    +     * @throws InterruptedException
    +     */
    +    @Test
    +    public void testElectionFraud() throws IOException, InterruptedException {
    +        // capture QuorumPeer logging
    +        Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout();
    +        ByteArrayOutputStream os = new ByteArrayOutputStream();
    +        WriterAppender appender = new WriterAppender(layout, os);
    +        appender.setThreshold(Level.INFO);
    +        Logger qlogger = Logger.getLogger(QuorumPeer.class);
    +        qlogger.addAppender(appender);
    +
    +        int numServers = 3;
    +
    +        // used for assertions later
    +        boolean foundLeading = false;
    +        boolean foundLooking = false;
    +        boolean foundFollowing = false;
    +
    +        try {
    +          // spin up a quorum, we use a small ticktime to make the test run faster
    +          Servers servers = LaunchServers(numServers, 500);
    --- End diff --
    
    Fixed, forgot to put that in this branch.


---

[GitHub] zookeeper issue #432: ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeaderEstab...

Posted by afine <gi...@git.apache.org>.
Github user afine commented on the issue:

    https://github.com/apache/zookeeper/pull/432
  
    [WIP] removed and I added a description of the issue to the JIRA.


---

[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/432#discussion_r156658012
  
    --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
    @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception {
                     output[0], 2);
         }
     
    +    /**
    +     * This test validates that if a quorum member determines that it is leader without the support of the rest of the
    +     * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower.
    +     *
    +     * @throws IOException
    +     * @throws InterruptedException
    +     */
    +    @Test
    +    public void testElectionFraud() throws IOException, InterruptedException {
    +        // capture QuorumPeer logging
    +        Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout();
    +        ByteArrayOutputStream os = new ByteArrayOutputStream();
    +        WriterAppender appender = new WriterAppender(layout, os);
    +        appender.setThreshold(Level.INFO);
    +        Logger qlogger = Logger.getLogger(QuorumPeer.class);
    +        qlogger.addAppender(appender);
    +
    +        int numServers = 3;
    +
    +        // used for assertions later
    +        boolean foundLeading = false;
    +        boolean foundLooking = false;
    +        boolean foundFollowing = false;
    +
    +        try {
    +          // spin up a quorum, we use a small ticktime to make the test run faster
    +          Servers servers = LaunchServers(numServers, 500);
    +
    +          // find the leader
    +          int trueLeader = -1;
    +          for (int i = 0; i < numServers; i++) {
    +            if (servers.mt[i].main.quorumPeer.leader != null) {
    +              trueLeader = i;
    +            }
    +          }
    +          Assert.assertTrue("There should be a leader", trueLeader >= 0);
    +
    +          // find a follower
    +          int falseLeader = (trueLeader + 1) % numServers;
    +          Assert.assertTrue(servers.mt[falseLeader].main.quorumPeer.follower != null);
    --- End diff --
    
    Minor: I believe that it's generally a good idea to add a short message to asserts to describe what could be the problem when assertion fails.


---

[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...

Posted by anmolnar <gi...@git.apache.org>.
Github user anmolnar commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/432#discussion_r156657072
  
    --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
    @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception {
                     output[0], 2);
         }
     
    +    /**
    +     * This test validates that if a quorum member determines that it is leader without the support of the rest of the
    +     * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower.
    +     *
    +     * @throws IOException
    +     * @throws InterruptedException
    +     */
    +    @Test
    +    public void testElectionFraud() throws IOException, InterruptedException {
    +        // capture QuorumPeer logging
    +        Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout();
    +        ByteArrayOutputStream os = new ByteArrayOutputStream();
    +        WriterAppender appender = new WriterAppender(layout, os);
    +        appender.setThreshold(Level.INFO);
    +        Logger qlogger = Logger.getLogger(QuorumPeer.class);
    +        qlogger.addAppender(appender);
    +
    +        int numServers = 3;
    +
    +        // used for assertions later
    +        boolean foundLeading = false;
    +        boolean foundLooking = false;
    +        boolean foundFollowing = false;
    +
    +        try {
    +          // spin up a quorum, we use a small ticktime to make the test run faster
    +          Servers servers = LaunchServers(numServers, 500);
    --- End diff --
    
    It'd be useful to utilize the class-wide ```servers``` field which is used to shutdown test servers in the ```tearDown()``` method.
    
    For example ```testHighestZxidJoinLate()``` test does it that way.


---