You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/06/12 18:52:00 UTC

[jira] [Commented] (KUDU-2302) Leader crashes if it can't resolve DNS address of a peer

    [ https://issues.apache.org/jira/browse/KUDU-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362395#comment-17362395 ] 

ASF subversion and git services commented on KUDU-2302:
-------------------------------------------------------

Commit f9647149a49ddb87ea0ecf069eab3b5ec0217136 in kudu's branch refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=f964714 ]

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Reviewed-on: http://gerrit.cloudera.org:8080/17585
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>


> Leader crashes if it can't resolve DNS address of a peer
> --------------------------------------------------------
>
>                 Key: KUDU-2302
>                 URL: https://issues.apache.org/jira/browse/KUDU-2302
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus, master, tserver
>    Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0
>            Reporter: Todd Lipcon
>            Assignee: Andrew Wong
>            Priority: Critical
>              Labels: crash, roadmap-candidate, stability
>
> In BecomeLeader we call:
> {code}
>  CHECK_OK(BecomeLeaderUnlocked());
> {code}
> This will fail if it fails to resolve the address of one of its peers. Instead it should probably continue to be leader but consider attempts to RPC to that peer to be failed due to network resolution (with periodic retries of resolution)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)