You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2019/05/10 18:14:15 UTC

[kudu-CR] [java] Fix handling of SERVICE UNAVAILABLE errors

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13308


Change subject: [java] Fix handling of SERVICE_UNAVAILABLE errors
......................................................................

[java] Fix handling of SERVICE_UNAVAILABLE errors

When the Java client sends a write, and it is rejected due to, e.g., a
soft memory limit error, it causes the client to invalidate the leader's
location for the tablet. The client then has to do a GetTableLocations
lookup on the master to refresh the cache. This is silly because the
location is valid and the leader is where the write must go. This patch
corrects the handling of certain RPC-level errors including the handling
of SERVICE_UNAVAILABLE. It will reduce greatly the number of unnecessary
calls to the master. These usually aren't a problem because master
lookups are fast, but it's a waste of time and resources nonetheless.

Change-Id: Id3437c779322e756a6e1fbc19f464f765741d58b
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
2 files changed, 14 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/13308/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id3437c779322e756a6e1fbc19f464f765741d58b
Gerrit-Change-Number: 13308
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [java] Fix handling of SERVICE UNAVAILABLE errors

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13308 )

Change subject: [java] Fix handling of SERVICE_UNAVAILABLE errors
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13308/1/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
File java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java:

http://gerrit.cloudera.org:8080/#/c/13308/1/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java@418
PS1, Line 418:     if (exception instanceof InvalidAuthnTokenException) {
> I program defensively in the Java client. If all callers take care of authn
If this is meant to be refactored, could you add a TODO here or at the callsite in responseReceived() mentioning that this duplication is intentional and defensive, and that it'd be nice to refactor this stuff to be more obviously correct?

I could see myself coming back to this without that context and being confused why we have this handling twice, when it seems to be exercised once (at least for now).



-- 
To view, visit http://gerrit.cloudera.org:8080/13308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3437c779322e756a6e1fbc19f464f765741d58b
Gerrit-Change-Number: 13308
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 13 May 2019 21:08:50 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] Fix handling of SERVICE UNAVAILABLE errors

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13308 )

Change subject: [java] Fix handling of SERVICE_UNAVAILABLE errors
......................................................................


Patch Set 1:

(1 comment)

Could we test this? Set up a test scenario where the client receives a service unavailable error, then prove (using RPC metrics?) that it doesn't bounce to the master for a new lookup?

http://gerrit.cloudera.org:8080/#/c/13308/1/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
File java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java:

http://gerrit.cloudera.org:8080/#/c/13308/1/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java@418
PS1, Line 418:     if (exception instanceof InvalidAuthnTokenException) {
Can these invalid token exceptions actually happen? It looks like the caller takes care of them.



-- 
To view, visit http://gerrit.cloudera.org:8080/13308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3437c779322e756a6e1fbc19f464f765741d58b
Gerrit-Change-Number: 13308
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 10 May 2019 21:19:59 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] Fix handling of SERVICE UNAVAILABLE errors

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13308 )

Change subject: [java] Fix handling of SERVICE_UNAVAILABLE errors
......................................................................


Patch Set 1: Code-Review+1

Please get a second review from a Java client expert, like Alexey or Andrew.


-- 
To view, visit http://gerrit.cloudera.org:8080/13308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3437c779322e756a6e1fbc19f464f765741d58b
Gerrit-Change-Number: 13308
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 13 May 2019 18:15:46 +0000
Gerrit-HasComments: No

[kudu-CR] [java] Fix handling of SERVICE UNAVAILABLE errors

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13308 )

Change subject: [java] Fix handling of SERVICE_UNAVAILABLE errors
......................................................................

[java] Fix handling of SERVICE_UNAVAILABLE errors

When the Java client sends a write, and it is rejected due to, e.g., a
soft memory limit error, it causes the client to invalidate the leader's
location for the tablet. The client then has to do a GetTableLocations
lookup on the master to refresh the cache. This is silly because the
location is valid and the leader is where the write must go. This patch
corrects the handling of certain RPC-level errors including the handling
of SERVICE_UNAVAILABLE. It will reduce greatly the number of unnecessary
calls to the master. These usually aren't a problem because master
lookups are fast, but it's a waste of time and resources nonetheless.

Change-Id: Id3437c779322e756a6e1fbc19f464f765741d58b
Reviewed-on: http://gerrit.cloudera.org:8080/13308
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
2 files changed, 14 insertions(+), 3 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, but someone else must approve
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/13308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id3437c779322e756a6e1fbc19f464f765741d58b
Gerrit-Change-Number: 13308
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [java] Fix handling of SERVICE UNAVAILABLE errors

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13308 )

Change subject: [java] Fix handling of SERVICE_UNAVAILABLE errors
......................................................................


Patch Set 1:

> (1 comment)
 > 
 > > (1 comment)
 > >
 > > > (1 comment)
 > > >
 > > > Could we test this? Set up a test scenario where the client
 > > > receives a service unavailable error, then prove (using RPC
 > > > metrics?) that it doesn't bounce to the master for a new
 > lookup?
 > >
 > > I worry that has flaky potential, e.g. if the service unavailable
 > > error is received, but then on retry the leader stepped down and
 > it
 > > triggers a master lookup. It'd be nicer if we had a good way to
 > > test this error handling code without needing a running cluster.
 > >
 > > FWIW, I was running a BDR test workload and seeing tons of silly
 > > GTLs logged by the client. I tried using the client with this
 > patch
 > > and saw those GTLs no more.
 > 
 > Maybe, using single-master test cluster could help to avoid
 > leader-related uncertainty here?
 > 
 > BTW, it seems we have some unit-test like coverage in
 > TestHandleTooBusy.java.  Maybe, it's possible to add a new test
 > scenario in there?

Ah, TestHandleTooBusy.java relies on real test cluster to be around: it's not a good reference for mock-like scenarios.

From the other side, maybe you could piggy-back on RPC traces here?  Anyway, if you think that piece doesn't need extra-coverage and you verified it works, maybe having it as-is good enough :)


-- 
To view, visit http://gerrit.cloudera.org:8080/13308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3437c779322e756a6e1fbc19f464f765741d58b
Gerrit-Change-Number: 13308
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 13 May 2019 22:24:02 +0000
Gerrit-HasComments: No

[kudu-CR] [java] Fix handling of SERVICE UNAVAILABLE errors

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13308 )

Change subject: [java] Fix handling of SERVICE_UNAVAILABLE errors
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3437c779322e756a6e1fbc19f464f765741d58b
Gerrit-Change-Number: 13308
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 22 May 2019 23:37:32 +0000
Gerrit-HasComments: No

[kudu-CR] [java] Fix handling of SERVICE UNAVAILABLE errors

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13308 )

Change subject: [java] Fix handling of SERVICE_UNAVAILABLE errors
......................................................................


Patch Set 1:

(1 comment)

> (1 comment)
 > 
 > > (1 comment)
 > >
 > > Could we test this? Set up a test scenario where the client
 > > receives a service unavailable error, then prove (using RPC
 > > metrics?) that it doesn't bounce to the master for a new lookup?
 > 
 > I worry that has flaky potential, e.g. if the service unavailable
 > error is received, but then on retry the leader stepped down and it
 > triggers a master lookup. It'd be nicer if we had a good way to
 > test this error handling code without needing a running cluster.
 > 
 > FWIW, I was running a BDR test workload and seeing tons of silly
 > GTLs logged by the client. I tried using the client with this patch
 > and saw those GTLs no more.

Maybe, using single-master test cluster could help to avoid leader-related uncertainty here?

BTW, it seems we have some unit-test like coverage in TestHandleTooBusy.java.  Maybe, it's possible to add a new test scenario in there?

http://gerrit.cloudera.org:8080/#/c/13308/1/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
File java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java:

http://gerrit.cloudera.org:8080/#/c/13308/1/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java@418
PS1, Line 418:     if (exception instanceof InvalidAuthnTokenException) {
> If this is meant to be refactored, could you add a TODO here or at the call
+1



-- 
To view, visit http://gerrit.cloudera.org:8080/13308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3437c779322e756a6e1fbc19f464f765741d58b
Gerrit-Change-Number: 13308
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 13 May 2019 22:16:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] Fix handling of SERVICE UNAVAILABLE errors

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/13308 )

Change subject: [java] Fix handling of SERVICE_UNAVAILABLE errors
......................................................................


Patch Set 1:

(1 comment)

> (1 comment)
 > 
 > Could we test this? Set up a test scenario where the client
 > receives a service unavailable error, then prove (using RPC
 > metrics?) that it doesn't bounce to the master for a new lookup?

I worry that has flaky potential, e.g. if the service unavailable error is received, but then on retry the leader stepped down and it triggers a master lookup. It'd be nicer if we had a good way to test this error handling code without needing a running cluster.

FWIW, I was running a BDR test workload and seeing tons of silly GTLs logged by the client. I tried using the client with this patch and saw those GTLs no more.

http://gerrit.cloudera.org:8080/#/c/13308/1/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
File java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java:

http://gerrit.cloudera.org:8080/#/c/13308/1/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java@418
PS1, Line 418:     if (exception instanceof InvalidAuthnTokenException) {
> Can these invalid token exceptions actually happen? It looks like the calle
I program defensively in the Java client. If all callers take care of authn exceptions, then handling authn exceptions here doesn't hurt; if not, then this helps. Eventually, I'd like to (or I'd like someone else to) refactor this code so it's easy to understand and I don't feel like I need to be so defensive when handling errors.



-- 
To view, visit http://gerrit.cloudera.org:8080/13308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3437c779322e756a6e1fbc19f464f765741d58b
Gerrit-Change-Number: 13308
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 13 May 2019 16:51:38 +0000
Gerrit-HasComments: Yes