You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2021/02/25 21:39:55 UTC

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17124


Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................

[java] KUDU-3213: try at different server on TABLET_NOT_RUNNING

Prior to this patch, if a tablet server were quiescing for a prolonged
period, scan requests could time out, complaining that the tablet server
is quiescing, but without ever retrying the scan at another tablet
server. This is because tablet servers will return TABLET_NOT_RUNNING to
clients when attempting a scan while quiescing. The behavior in the C++
client is that the location is then blacklisted and the request is
retried elsewhere. The behavior in the Java client, though, is that the
same location is retried until failure.

This patch addresses this by treating TABLET_NOT_RUNNING errors in the
Java client as we would for TABLET_NOT_FOUND, which is actually quite
similar to the handling for TABLET_NOT_RUNNING in the C++ client: the
location is invalidated for further attempts, and the request is retried
elsewhere.

Why not just have quiescing tablet servers return TABLET_NOT_FOUND,
then? TABLET_NOT_FOUND errors in the C++ client actually have some
behavior not present in the Java client: a tablet whose location is
invalidated with TABLET_NOT_FOUND in the C++ client will be required to
be looked up again, requiring a round trip to the master. This behavior
doesn't exist in the Java client, so I thought it easiest to piggyback
on TABLET_NOT_FOUND handling for now.

Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
---
M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
3 files changed, 57 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/17124/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17124 )

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17124/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/17124/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@96
PS4, Line 96:  set some partitioning though).
> nit: would it make sense to use hash partitioning instead?  Otherwise, how 
The partitioning isn't important here other than the fact that Kudu complains if there's none set. Added a comment.


http://gerrit.cloudera.org:8080/#/c/17124/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@120
PS4, Line 120: rver.waitFor());
> nit: how do we know it's so, indeed?  Could it happen that the scanner alwa
We don't, and we might not. But without proper handling of quiescing servers, at least, without proper handling of quiescing this test fails a non-negligible amount of the time.



-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Mar 2021 19:58:35 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17124 )

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@118
PS3, Line 118: assertEquals(0, quiesceTserver.waitFor());
             : 
> why do we need to call waitFor() twice here?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 22:10:42 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Grant Henke, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17124

to look at the new patch set (#5).

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................

[java] KUDU-3213: try at different server on TABLET_NOT_RUNNING

Prior to this patch, if a tablet server were quiescing for a prolonged
period, scan requests could time out, complaining that the tablet server
is quiescing, but without ever retrying the scan at another tablet
server. This is because tablet servers will return TABLET_NOT_RUNNING to
clients when attempting a scan while quiescing. The behavior in the C++
client is that the location is then blacklisted and the request is
retried elsewhere. The behavior in the Java client, though, is that the
same location is retried until failure.

This patch addresses this by treating TABLET_NOT_RUNNING errors in the
Java client as we would for TABLET_NOT_FOUND, which is actually quite
similar to the handling for TABLET_NOT_RUNNING in the C++ client: the
location is invalidated for further attempts, and the request is retried
elsewhere.

Why not just have quiescing tablet servers return TABLET_NOT_FOUND,
then? TABLET_NOT_FOUND errors in the C++ client actually have some
behavior not present in the Java client: a tablet whose location is
invalidated with TABLET_NOT_FOUND in the C++ client will be required to
be looked up again, requiring a round trip to the master. This behavior
doesn't exist in the Java client, so I thought it easiest to piggyback
on TABLET_NOT_FOUND handling for now.

Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
---
M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
3 files changed, 59 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/17124/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17124 )

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/17124/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@118
PS3, Line 118: quiesceTserver.waitFor();
             :     assertEquals(0, quiesceTserver.waitFor()
why do we need to call waitFor() twice here?



-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 20:25:15 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17124 )

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................


Patch Set 4: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17124/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/17124/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@96
PS4, Line 96: setRangePartitionColumns(Collections.singletonList("key"))
nit: would it make sense to use hash partitioning instead?  Otherwise, how do we know that the quiesce tablet server hosts the replica that contains the necessary data?  If it's so even with range-partitioned table, it would be great if you could add a small comment explaining why it's so.  Thanks!


http://gerrit.cloudera.org:8080/#/c/17124/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@120
PS4, Line 120: if the scan goes to the quiescing server
nit: how do we know it's so, indeed?  Could it happen that the scanner always hits only non-quested servers?



-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Mar 2021 02:52:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17124 )

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................


Patch Set 3: Verified+1

Unrelated failure of HmsConfigurations/MasterFailoverTest.TestMasterUUIDResolution/1


-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 03:22:25 +0000
Gerrit-HasComments: No

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Attila Bukor, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17124

to look at the new patch set (#3).

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................

[java] KUDU-3213: try at different server on TABLET_NOT_RUNNING

Prior to this patch, if a tablet server were quiescing for a prolonged
period, scan requests could time out, complaining that the tablet server
is quiescing, but without ever retrying the scan at another tablet
server. This is because tablet servers will return TABLET_NOT_RUNNING to
clients when attempting a scan while quiescing. The behavior in the C++
client is that the location is then blacklisted and the request is
retried elsewhere. The behavior in the Java client, though, is that the
same location is retried until failure.

This patch addresses this by treating TABLET_NOT_RUNNING errors in the
Java client as we would for TABLET_NOT_FOUND, which is actually quite
similar to the handling for TABLET_NOT_RUNNING in the C++ client: the
location is invalidated for further attempts, and the request is retried
elsewhere.

Why not just have quiescing tablet servers return TABLET_NOT_FOUND,
then? TABLET_NOT_FOUND errors in the C++ client actually have some
behavior not present in the Java client: a tablet whose location is
invalidated with TABLET_NOT_FOUND in the C++ client will be required to
be looked up again, requiring a round trip to the master. This behavior
doesn't exist in the Java client, so I thought it easiest to piggyback
on TABLET_NOT_FOUND handling for now.

Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
---
M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
3 files changed, 57 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/17124/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17124 )

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................


Patch Set 5: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17124/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/17124/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@120
PS4, Line 120: rver.waitFor());
> We don't, and we might not. But without proper handling of quiescing server
I see.  Thank you for the explanation.  I think it's good enough if it fails in some non-negligible amount of runs.  SGTM.



-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 06 Mar 2021 07:42:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Attila Bukor, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17124

to look at the new patch set (#2).

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................

[java] KUDU-3213: try at different server on TABLET_NOT_RUNNING

Prior to this patch, if a tablet server were quiescing for a prolonged
period, scan requests could time out, complaining that the tablet server
is quiescing, but without ever retrying the scan at another tablet
server. This is because tablet servers will return TABLET_NOT_RUNNING to
clients when attempting a scan while quiescing. The behavior in the C++
client is that the location is then blacklisted and the request is
retried elsewhere. The behavior in the Java client, though, is that the
same location is retried until failure.

This patch addresses this by treating TABLET_NOT_RUNNING errors in the
Java client as we would for TABLET_NOT_FOUND, which is actually quite
similar to the handling for TABLET_NOT_RUNNING in the C++ client: the
location is invalidated for further attempts, and the request is retried
elsewhere.

Why not just have quiescing tablet servers return TABLET_NOT_FOUND,
then? TABLET_NOT_FOUND errors in the C++ client actually have some
behavior not present in the Java client: a tablet whose location is
invalidated with TABLET_NOT_FOUND in the C++ client will be required to
be looked up again, requiring a round trip to the master. This behavior
doesn't exist in the Java client, so I thought it easiest to piggyback
on TABLET_NOT_FOUND handling for now.

Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
---
M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
3 files changed, 57 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/17124/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17124 )

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................

[java] KUDU-3213: try at different server on TABLET_NOT_RUNNING

Prior to this patch, if a tablet server were quiescing for a prolonged
period, scan requests could time out, complaining that the tablet server
is quiescing, but without ever retrying the scan at another tablet
server. This is because tablet servers will return TABLET_NOT_RUNNING to
clients when attempting a scan while quiescing. The behavior in the C++
client is that the location is then blacklisted and the request is
retried elsewhere. The behavior in the Java client, though, is that the
same location is retried until failure.

This patch addresses this by treating TABLET_NOT_RUNNING errors in the
Java client as we would for TABLET_NOT_FOUND, which is actually quite
similar to the handling for TABLET_NOT_RUNNING in the C++ client: the
location is invalidated for further attempts, and the request is retried
elsewhere.

Why not just have quiescing tablet servers return TABLET_NOT_FOUND,
then? TABLET_NOT_FOUND errors in the C++ client actually have some
behavior not present in the Java client: a tablet whose location is
invalidated with TABLET_NOT_FOUND in the C++ client will be required to
be looked up again, requiring a round trip to the master. This behavior
doesn't exist in the Java client, so I thought it easiest to piggyback
on TABLET_NOT_FOUND handling for now.

Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Reviewed-on: http://gerrit.cloudera.org:8080/17124
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
3 files changed, 59 insertions(+), 4 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Attila Bukor, Kudu Jenkins, Grant Henke, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17124

to look at the new patch set (#4).

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................

[java] KUDU-3213: try at different server on TABLET_NOT_RUNNING

Prior to this patch, if a tablet server were quiescing for a prolonged
period, scan requests could time out, complaining that the tablet server
is quiescing, but without ever retrying the scan at another tablet
server. This is because tablet servers will return TABLET_NOT_RUNNING to
clients when attempting a scan while quiescing. The behavior in the C++
client is that the location is then blacklisted and the request is
retried elsewhere. The behavior in the Java client, though, is that the
same location is retried until failure.

This patch addresses this by treating TABLET_NOT_RUNNING errors in the
Java client as we would for TABLET_NOT_FOUND, which is actually quite
similar to the handling for TABLET_NOT_RUNNING in the C++ client: the
location is invalidated for further attempts, and the request is retried
elsewhere.

Why not just have quiescing tablet servers return TABLET_NOT_FOUND,
then? TABLET_NOT_FOUND errors in the C++ client actually have some
behavior not present in the Java client: a tablet whose location is
invalidated with TABLET_NOT_FOUND in the C++ client will be required to
be looked up again, requiring a round trip to the master. This behavior
doesn't exist in the Java client, so I thought it easiest to piggyback
on TABLET_NOT_FOUND handling for now.

Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
---
M java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/KuduTestHarness.java
3 files changed, 56 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/17124/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has removed a vote on this change.

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [java] KUDU-3213: try at different server on TABLET NOT RUNNING

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17124 )

Change subject: [java] KUDU-3213: try at different server on TABLET_NOT_RUNNING
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ac84a52676ff361fa1ba996665b338d1bbfba1
Gerrit-Change-Number: 17124
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Mar 2021 00:03:12 +0000
Gerrit-HasComments: No