You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zookeeper.apache.org by sy...@apache.org on 2021/03/09 17:08:21 UTC

[zookeeper] branch branch-3.5 updated: ZOOKEEPER-4220: Potential redundant connection attempts during leader election

This is an automated email from the ASF dual-hosted git repository.

symat pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/zookeeper.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new 755cb27  ZOOKEEPER-4220: Potential redundant connection attempts during leader election
755cb27 is described below

commit 755cb27914589531b338f76cc7dae89775493ab4
Author: Mate Szalay-Beko <sy...@apache.org>
AuthorDate: Tue Mar 9 17:07:55 2021 +0000

    ZOOKEEPER-4220: Potential redundant connection attempts during leader election
    
    We have a logic in the server code, that would try to connect to an other quorum member, based
    on its server ID. We identify the address assigned to this ID first based on the last committed
    quorum configuration. If the connection attempt fails (or the server is not known in the
    committed configuration) then we try to find the address based on the last proposed quorum
    configuration. But we should do the second connection attempt, only if the address in the
    last proposed configuration differs from the address in the last committed configuration.
    Otherwise we would just retry to connect to the same address that failed just right before.
    
    In the current code we have a bug, because we compare the address object references (use "!=")
    instead of comparing the objects themselves (using "not equals"). In certain edge cases (e.g.
    when the last proposed and last committed addresses are the same, but the address is unreachable)
    this bug can lead to unnecessary retry of connection attempts. The normal behaviour would be to
    mark this connection attempt to be failed and wait for e.g. the next election round or wait for
    the other server to come online and initiate a connection to us.
    
    Author: Mate Szalay-Beko <sy...@apache.org>
    
    Reviewers: Andor Molnar <an...@apache.org>, Damien Diederen <dd...@crosstwine.com>
    
    Closes #1631 from symat/ZOOKEEPER-4220-branch-3.5
---
 .../java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java  | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java b/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
index 066ea9f..64673f5 100644
--- a/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
+++ b/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
@@ -722,9 +722,10 @@ public class QuorumCnxManager {
                 if (connectOne(sid, lastCommittedView.get(sid).electionAddr))
                     return;
             }
-            if (lastSeenQV != null && lastProposedView.containsKey(sid)
-                    && (!knownId || (lastProposedView.get(sid).electionAddr !=
-                    lastCommittedView.get(sid).electionAddr))) {
+            if (lastSeenQV != null
+                && lastProposedView.containsKey(sid)
+                && (!knownId
+                    || !lastProposedView.get(sid).electionAddr.equals(lastCommittedView.get(sid).electionAddr))) {
                 knownId = true;
                 LOG.debug("Server {} knows {} already, it is in the lastProposedView", self.getId(), sid);
                 if (connectOne(sid, lastProposedView.get(sid).electionAddr))