You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2018/03/14 20:51:45 UTC

[kudu-CR](branch-1.5.x) KUDU-2343. java: properly reconnect to new leader master after failover

Hello Alexey Serbin,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/9638

to review the following change.


Change subject: KUDU-2343. java: properly reconnect to new leader master after failover
......................................................................

KUDU-2343. java: properly reconnect to new leader master after failover

This fixes the "fake" location information returned in response to a
ConnectToMaster RPC to include a distinct "fake UUID" for each master.
Previously, we were using an empty string for the UUID of the masters.
This caused collisions in the ConnectionCache, which is keyed by server
UUIDs.

The fake UUID added by this patch matches the fake UUID already in use
by AsyncKuduClient.newMasterRpcProxy. This should allow us to share the
RPC connection between the ConnectToMaster RPCs and the subsequent
GetTableLocation RPCs, which is also a benefit for latency after a
failover or on a fresh client.

Additionally, this will help with various log messages that previously
would print an empty UUID string.

A prior version of this patch solved the problem by changing the key for
the ConnectionCache to be based on IP address, which has other benefits
in terms of future support for servers changing their DNS resolution at
runtime. However, since this patch is intended for backport into prior
releases, this simpler approach is taken for now. A TODO is added for
the longer-term idea.

An existing test which tested killing a master now runs in a second mode
which restarts the master. This reproduced the bug prior to the fix.
This patch also cleans up that test somewhat - it was doing some buggy
logic to attempt to kill more than one tablet server, but in fact just
called "killTabletServer" three times on the same one. Killing three
tablet servers never made sense, either, since the table in the test
only had three replicas. Neither did it make sense to start six tablet
servers for the test.

Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Reviewed-on: http://gerrit.cloudera.org:8080/9612
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Todd Lipcon <to...@apache.org>
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java
5 files changed, 63 insertions(+), 19 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/38/9638/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9638
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>

[kudu-CR](branch-1.5.x) KUDU-2343. java: properly reconnect to new leader master after failover

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9638 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after failover
......................................................................


Patch Set 1: Verified+1

Unrelated test failures on past branch


-- 
To view, visit http://gerrit.cloudera.org:8080/9638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9638
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 14 Mar 2018 23:46:11 +0000
Gerrit-HasComments: No

[kudu-CR](branch-1.5.x) KUDU-2343. java: properly reconnect to new leader master after failover

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2343. java: properly reconnect to new leader master after failover
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/9638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9638
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR](branch-1.5.x) KUDU-2343. java: properly reconnect to new leader master after failover

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9638 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after failover
......................................................................

KUDU-2343. java: properly reconnect to new leader master after failover

This fixes the "fake" location information returned in response to a
ConnectToMaster RPC to include a distinct "fake UUID" for each master.
Previously, we were using an empty string for the UUID of the masters.
This caused collisions in the ConnectionCache, which is keyed by server
UUIDs.

The fake UUID added by this patch matches the fake UUID already in use
by AsyncKuduClient.newMasterRpcProxy. This should allow us to share the
RPC connection between the ConnectToMaster RPCs and the subsequent
GetTableLocation RPCs, which is also a benefit for latency after a
failover or on a fresh client.

Additionally, this will help with various log messages that previously
would print an empty UUID string.

A prior version of this patch solved the problem by changing the key for
the ConnectionCache to be based on IP address, which has other benefits
in terms of future support for servers changing their DNS resolution at
runtime. However, since this patch is intended for backport into prior
releases, this simpler approach is taken for now. A TODO is added for
the longer-term idea.

An existing test which tested killing a master now runs in a second mode
which restarts the master. This reproduced the bug prior to the fix.
This patch also cleans up that test somewhat - it was doing some buggy
logic to attempt to kill more than one tablet server, but in fact just
called "killTabletServer" three times on the same one. Killing three
tablet servers never made sense, either, since the table in the test
only had three replicas. Neither did it make sense to start six tablet
servers for the test.

Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Reviewed-on: http://gerrit.cloudera.org:8080/9612
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Todd Lipcon <to...@apache.org>
Reviewed-on: http://gerrit.cloudera.org:8080/9638
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java
5 files changed, 63 insertions(+), 19 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Todd Lipcon: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9638
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.5.x) KUDU-2343. java: properly reconnect to new leader master after failover

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9638 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after failover
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9638
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Wed, 14 Mar 2018 20:52:30 +0000
Gerrit-HasComments: No