You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/09/21 23:11:00 UTC

[jira] [Commented] (KUDU-1885) Master caches DNS name resolution forever

    [ https://issues.apache.org/jira/browse/KUDU-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418349#comment-17418349 ] 

ASF subversion and git services commented on KUDU-1885:
-------------------------------------------------------

Commit 41ebabf2eb618b33fd30ad1821ccbda9d6390010 in kudu's branch refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=41ebabf ]

[rpc] KUDU-75: refresh DNS entries if proxies hit a network error

This patch aims to tackle the following issues that revolve around
changes in addresses at runtime.
- KUDU-1885: master long-lived tserver proxies need to be re-resolved in
  case nodes are assigned different addresses; today we just retry at
  the same location forever.
- KUDU-1620: tablet consensus long-lived proxies need to be re-resolved
  on failure.
- C++ clients' usages of RemoteTabletServer also have long-lived proxies
  and are likely to run into similar problems if tservers are restarted
  and assigned new physical addresses.

It addresses this by plumbing a DnsResolver into the rpc::Proxy class,
and chaining the asynchronous callback to an asynchronous refresh of the
address with the newly introduced refreshing capabilities of the
DnsResolver.

The new style of proxy isn't currently used, but a test is added
exercising the new functionality.

Change-Id: I777d169bd3a461294e5721f05071b726ced70f7e
Reviewed-on: http://gerrit.cloudera.org:8080/17839
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>


> Master caches DNS name resolution forever
> -----------------------------------------
>
>                 Key: KUDU-1885
>                 URL: https://issues.apache.org/jira/browse/KUDU-1885
>             Project: Kudu
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.3.0
>            Reporter: Adar Dembo
>            Priority: Major
>
> TSDescriptor::GetTSAdminProxy() and TSDescriptor::GetConsensusProxy() will return the same proxy instances over and over. Normally, this is a reasonable optimization. But suppose the IP address of the tserver changes (due to a DHCP lease expiring or some such). Now these methods will be returning unusable proxies, and there's no way to "reset" them.
> Admittedly this scenario is a little contrived: if a tserver's IP address suddenly changes, a bunch of other stuff will break too. The tserver will probably need to be restarted (since it's bound to a socket whose address no longer exists), and consensus may be thoroughly wrecked due to built-in host/port assumptions (see KUDU-418).
> An issue like this was reported by a user in Slack, who was running a master and tserver on the same box. The symptom was "half-open" communication between them: the tserver could heartbeat to the master, but the master could not send RPCs to the tserver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)