You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by al...@apache.org on 2020/06/24 04:22:56 UTC

[kudu] 03/03: [mini-cluster] fix GetLeaderMasterIndex()

This is an automated email from the ASF dual-hosted git repository.

alexey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit cd37d7cbcae4ff9112ee66123ead62b404f656ef
Author: Alexey Serbin <al...@apache.org>
AuthorDate: Tue Jun 23 15:51:20 2020 -0700

    [mini-cluster] fix GetLeaderMasterIndex()
    
    Prior to this patch, when running external mini-cluster in
    BindMode::UNIQUE_LOOPBACK mode, in rare cases masters might be bound
    to the same port number at different loopback IP addresses.  In such
    cases, GetLeaderMasterIndex() might return incorrect results.
    
    I saw a manifestation of such an issue with failure of the
    MultiMasterIdleConnectionsITest.ClientReacquiresAuthnToken scenario at
    http://dist-test.cloudera.org/job?job_id=jenkins-slave.1592866280.18987
    
    I think in that run the leader master index was detected as 0 (due to
    the issue this patch fixes), but in reality the index was 1, so all
    the attempts to start an election with current leader master didn't
    change the leadership role, where the leader master output messages
    like the following for every RunLeaderElectionRequest() call:
    
      I0622 22:58:08.909113 20844 raft_consensus.cc:461] T 00000000000000000000000000000000 P d31fb2325643457e8787dc92f8c3a4cf [term 1 LEADER]: Not starting forced leader election -- already a leader
      src/kudu/integration-tests/auth_token_expire-itest.cc:587: Failure
    
    Eventually, the test failed with the message:
      Expected: (former_leader_master_idx) != (idx), actual: 1 vs 1
    
    Change-Id: I3f445e587fe2152efbb67e4a36d36c760b486dac
    Reviewed-on: http://gerrit.cloudera.org:8080/16106
    Tested-by: Alexey Serbin <as...@cloudera.com>
    Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
 src/kudu/mini-cluster/external_mini_cluster.cc | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/kudu/mini-cluster/external_mini_cluster.cc b/src/kudu/mini-cluster/external_mini_cluster.cc
index c2b649a..daff355 100644
--- a/src/kudu/mini-cluster/external_mini_cluster.cc
+++ b/src/kudu/mini-cluster/external_mini_cluster.cc
@@ -757,6 +757,7 @@ Status ExternalMiniCluster::GetLeaderMasterIndex(int* idx) {
   Synchronizer sync;
   vector<pair<Sockaddr, string>> addrs_with_names;
   Sockaddr leader_master_addr;
+  string leader_master_hostname;
   MonoTime deadline = MonoTime::Now() + MonoDelta::FromSeconds(5);
 
   for (const scoped_refptr<ExternalMaster>& master : masters_) {
@@ -764,9 +765,10 @@ Status ExternalMiniCluster::GetLeaderMasterIndex(int* idx) {
   }
   const auto& cb = [&](const Status& status,
                        const pair<Sockaddr, string>& leader_master,
-                       const master::ConnectToMasterResponsePB& resp) {
+                       const master::ConnectToMasterResponsePB& /*resp*/) {
     if (status.ok()) {
       leader_master_addr = leader_master.first;
+      leader_master_hostname = leader_master.second;
     }
     sync.StatusCB(status);
   };
@@ -782,7 +784,13 @@ Status ExternalMiniCluster::GetLeaderMasterIndex(int* idx) {
   RETURN_NOT_OK(sync.Wait());
   bool found = false;
   for (int i = 0; i < masters_.size(); i++) {
-    if (masters_[i]->bound_rpc_hostport().port() == leader_master_addr.port()) {
+    const auto& bound_hp = masters_[i]->bound_rpc_hostport();
+    // If using BindMode::UNIQUE_LOOPBACK mode, in rare cases different masters
+    // might bind to different local IP addresses but use same port numbers.
+    // So, it's necessary to check both the returned hostnames and IP addresses
+    // to point to leader master.
+    if (bound_hp.port() == leader_master_addr.port() &&
+        bound_hp.host() == leader_master_hostname) {
       found = true;
       *idx = i;
       break;