You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/03 17:57:40 UTC

[GitHub] [pulsar] MarvinCai commented on a change in pull request #9443: Try to fix flaky LeaderElection test.

MarvinCai commented on a change in pull request #9443:
URL: https://github.com/apache/pulsar/pull/9443#discussion_r569627190



##########
File path: pulsar-broker/src/test/java/org/apache/pulsar/broker/loadbalance/LoadBalancerTest.java
##########
@@ -176,23 +176,30 @@ void shutdown() throws Exception {
         bkEnsemble.stop();
     }
 
-    private LeaderBroker loopUntilLeaderChanges(LeaderElectionService les, LeaderBroker oldLeader,
-            LeaderBroker newLeader) throws InterruptedException {
+    private void loopUntilLeaderChangesForAllBroker(List<PulsarService> activePulsars, LeaderBroker oldLeader)
+            throws InterruptedException {
         int loopCount = 0;
+        boolean settled;
 
         while (loopCount < MAX_RETRIES) {
             Thread.sleep(1000);
-            // Check if the new leader is elected. If yes, break without incrementing the loopCount
-            newLeader = les.getCurrentLeader().get();
-            if (newLeader.equals(oldLeader) == false) {
+            settled = true;
+            // Check if the all active pulsar see a new leader
+            for (PulsarService pulsar : activePulsars) {
+                Optional<LeaderBroker> leader = pulsar.getLeaderElectionService().readCurrentLeader().join();
+                if (leader.isPresent() && leader.get().equals(oldLeader)) {

Review comment:
       @315157973 I think the old logic is it pick the last follower it seen and check if it sees the new leader, which is in this chunk of code ([ref1](https://github.com/apache/pulsar/blob/fd7da5210b59fe9fd7b2619534e8122ba7b2701a/pulsar-broker/src/test/java/org/apache/pulsar/broker/loadbalance/LoadBalancerTest.java#L730), [ref2](https://github.com/apache/pulsar/blob/fd7da5210b59fe9fd7b2619534e8122ba7b2701a/pulsar-broker/src/test/java/org/apache/pulsar/broker/loadbalance/LoadBalancerTest.java#L186-L188)).
   And then it just do the check which can't guarantee all follower already saw a new leader since for some follower which try to become leader, it'll first see empty path "/loadbalance/leader", read it in cache as [Optional.Empty] then try to create the znode, but after it fail(other node already create that znode and become leader) and before it's cache get updated by zk watch, there're might be some delay so old test can still see that [Optinal.Empty]. So loop through all followers and making sure all of them already a new leader, then check all of them see the same leader can solve the problem.
   Does it make sense?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org