You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Bankim Bhavsar (Jira)" <ji...@apache.org> on 2021/08/27 19:44:00 UTC

[jira] [Created] (KUDU-3312) SetPermanentUuidForRemotePeer() isn't resilient to DNS resolution failure

Bankim Bhavsar created KUDU-3312:
------------------------------------

             Summary: SetPermanentUuidForRemotePeer() isn't resilient to DNS resolution failure
                 Key: KUDU-3312
                 URL: https://issues.apache.org/jira/browse/KUDU-3312
             Project: Kudu
          Issue Type: Improvement
          Components: consensus, master
            Reporter: Bankim Bhavsar


When bringing up a new Kudu cluster with multiple masters, these masters must be brought up together and should start within a short time window of 30 secs (FLAGS_raft_get_node_instance_timeout_ms)

However bringing up multiple masters on Kubernetes noticed that bring up of multiple masters fail sometimes since masters aren't brought up together within a short time window. Simply configuring FLAGS_raft_get_node_instance_timeout_ms to a higher timeout didn't help in some cases as the DNS resolution would fail in SetPermanentUuidForRemotePeer() at the very beginning.

{code}
 E0827 19:28:53.052981 91 master.cc:279] Unable to init master catalog manager: Network error: Unable to initialize catalog manager: Failed to initialize sys tables async: Failed to create new distributed │ │ Raft config: Unable to resolve UUID for peer member_type: VOTER last_known_addr \{ host: "kudu-master-0.kudu-masters.warehouse-1630092493-z2sz.svc.cluster.local" port: 7051 }: unable to resolve address for ku │ │ du-master-0.kudu-masters.warehouse-1630092493-z2sz.svc.cluster.local: Name or service not known
{code}

So the function SetPermanentUuidForRemotePeer() needs to be retry for proxy creation/DNS failure in addition to RPC request.
https://github.com/apache/kudu/blob/master/src/kudu/consensus/consensus_peers.cc#L627
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)