You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Mike Percy (Code Review)" <ge...@cloudera.org> on 2018/10/10 07:10:01 UTC

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Mike Percy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11646


Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................

client: add timeout duration and scan attempts to scanner errors

Previously, scanner errors could be confusing because certain types of
errors, like ServiceUnavailable, are retriable, and the RPC timeouts
mentioned in some error messages tend to be confusingly short because
after several retries the client-side deadline is looming. This patch
adds additional details to scanner timeout error messages, including
information about number of retries and the client-set timeout value.

An example of a new "enhanced" error message looks something like this
(from the new test that injects an error on scan):

Timed out: exceeded configured scan timeout of 1.000s: after 7 scan attempts: unable to retry before timeout: Remote error: Service unavailable: Injecting service unavailable status on Scan due to --scanner_inject_service_unavailable_on_continue_scan

Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
---
M src/kudu/client/client-test.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 55 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/11646/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 1
Gerrit-Owner: Mike Percy <mp...@apache.org>

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11646 )

Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................

client: add timeout duration and scan attempts to scanner errors

Previously, scanner errors could be confusing because certain types of
errors, like ServiceUnavailable, are retriable, and the RPC timeouts
mentioned in some error messages tend to be confusingly short because
after several retries the client-side deadline is looming. This patch
adds additional details to scanner timeout error messages, including
information about number of retries and the client-set timeout value.

An example of a new "enhanced" error message looks something like this
(from the new test that injects an error on scan):

Timed out: exceeded configured scan timeout of 1.000s: after 7 scan attempts: unable to retry before timeout: Remote error: Service unavailable: Injecting service unavailable status on Scan due to --scanner_inject_service_unavailable_on_continue_scan

Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Reviewed-on: http://gerrit.cloudera.org:8080/11646
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Mike Percy <mp...@apache.org>
---
M src/kudu/client/client-test.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 53 insertions(+), 8 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Mike Percy: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 6
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11646 )

Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................


Patch Set 5: Verified+1

overriding jenkins failure due to isolateserver glitch


-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 5
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 02:08:47 +0000
Gerrit-HasComments: No

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11646 )

Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/client/client-test.cc@5173
PS3, Line 5173:   // As the tservers are still starting up, the scan will retry until it
              :   // times out. The actual error should be embedded in the returned status.
> The first sentence doesn't make sense; ClientTest::SetUp() starts the clust
oops, thanks for the catch


http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/client/scanner-internal.cc@498
PS3, Line 498:     s = HandleError(scan_status, deadline, blacklist, /* needs_reopen=*/ nullptr);
             :     RETURN_NOT_OK(s);
> Why this change?
Spurious change -- I backed out an error handling change. Reverted.


http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/tserver/tablet_service.cc@2021
PS3, Line 2021:   if (PREDICT_FALSE(FLAGS_scanner_inject_service_unavailable_on_continue_scan)) {
              :     return Status::ServiceUnavailable("Injecting service unavailable status on Scan due to "
              :                                       "--scanner_inject_service_unavailable_on_continue_scan");
              :   }
> Nit: would prefer if this came at the top of the function; makes it easier 
The intention behind this location was to only reject the service when it was possible to service but leave the rest of the logic as-is, i.e. if the scanner cannot be found ideally we would still return a scanner not found error.



-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 3
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 23:50:56 +0000
Gerrit-HasComments: Yes

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11646 )

Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 3
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 22:02:13 +0000
Gerrit-HasComments: No

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11646

to look at the new patch set (#5).

Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................

client: add timeout duration and scan attempts to scanner errors

Previously, scanner errors could be confusing because certain types of
errors, like ServiceUnavailable, are retriable, and the RPC timeouts
mentioned in some error messages tend to be confusingly short because
after several retries the client-side deadline is looming. This patch
adds additional details to scanner timeout error messages, including
information about number of retries and the client-set timeout value.

An example of a new "enhanced" error message looks something like this
(from the new test that injects an error on scan):

Timed out: exceeded configured scan timeout of 1.000s: after 7 scan attempts: unable to retry before timeout: Remote error: Service unavailable: Injecting service unavailable status on Scan due to --scanner_inject_service_unavailable_on_continue_scan

Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
---
M src/kudu/client/client-test.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 53 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/11646/5
-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 5
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11646

to look at the new patch set (#3).

Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................

client: add timeout duration and scan attempts to scanner errors

Previously, scanner errors could be confusing because certain types of
errors, like ServiceUnavailable, are retriable, and the RPC timeouts
mentioned in some error messages tend to be confusingly short because
after several retries the client-side deadline is looming. This patch
adds additional details to scanner timeout error messages, including
information about number of retries and the client-set timeout value.

An example of a new "enhanced" error message looks something like this
(from the new test that injects an error on scan):

Timed out: exceeded configured scan timeout of 1.000s: after 7 scan attempts: unable to retry before timeout: Remote error: Service unavailable: Injecting service unavailable status on Scan due to --scanner_inject_service_unavailable_on_continue_scan

Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
---
M src/kudu/client/client-test.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 55 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/11646/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 3
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11646

to look at the new patch set (#4).

Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................

client: add timeout duration and scan attempts to scanner errors

Previously, scanner errors could be confusing because certain types of
errors, like ServiceUnavailable, are retriable, and the RPC timeouts
mentioned in some error messages tend to be confusingly short because
after several retries the client-side deadline is looming. This patch
adds additional details to scanner timeout error messages, including
information about number of retries and the client-set timeout value.

An example of a new "enhanced" error message looks something like this
(from the new test that injects an error on scan):

Timed out: exceeded configured scan timeout of 1.000s: after 7 scan attempts: unable to retry before timeout: Remote error: Service unavailable: Injecting service unavailable status on Scan due to --scanner_inject_service_unavailable_on_continue_scan

Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
---
M src/kudu/client/client-test.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 53 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/11646/4
-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 4
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11646 )

Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 5
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 00:08:25 +0000
Gerrit-HasComments: No

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11646 )

Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/client/client-test.cc@5173
PS3, Line 5173:   // As the tservers are still starting up, the scan will retry until it
              :   // times out. The actual error should be embedded in the returned status.
The first sentence doesn't make sense; ClientTest::SetUp() starts the cluster and waits for the tservers to finish starting.

Seems like it was copied from the previous test which explicitly restarts the tservers.


http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/client/scanner-internal.cc@498
PS3, Line 498:     s = HandleError(scan_status, deadline, blacklist, /* needs_reopen=*/ nullptr);
             :     RETURN_NOT_OK(s);
Why this change?


http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/11646/3/src/kudu/tserver/tablet_service.cc@2021
PS3, Line 2021:   if (PREDICT_FALSE(FLAGS_scanner_inject_service_unavailable_on_continue_scan)) {
              :     return Status::ServiceUnavailable("Injecting service unavailable status on Scan due to "
              :                                       "--scanner_inject_service_unavailable_on_continue_scan");
              :   }
Nit: would prefer if this came at the top of the function; makes it easier to skip test-only stuff and focus on code with a production impact.



-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 3
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 16:50:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has removed a vote on this change.

Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 5
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] client: add timeout duration and scan attempts to scanner errors

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11646

to look at the new patch set (#2).

Change subject: client: add timeout duration and scan attempts to scanner errors
......................................................................

client: add timeout duration and scan attempts to scanner errors

Previously, scanner errors could be confusing because certain types of
errors, like ServiceUnavailable, are retriable, and the RPC timeouts
mentioned in some error messages tend to be confusingly short because
after several retries the client-side deadline is looming. This patch
adds additional details to scanner timeout error messages, including
information about number of retries and the client-set timeout value.

An example of a new "enhanced" error message looks something like this
(from the new test that injects an error on scan):

Timed out: exceeded configured scan timeout of 1.000s: after 7 scan attempts: unable to retry before timeout: Remote error: Service unavailable: Injecting service unavailable status on Scan due to --scanner_inject_service_unavailable_on_continue_scan

Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
---
M src/kudu/client/client-test.cc
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 56 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/11646/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a8f731f029132d0894355098d5804840f09e7c2
Gerrit-Change-Number: 11646
Gerrit-PatchSet: 2
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Kudu Jenkins