You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2020/10/27 05:32:45 UTC

[kudu-CR] [tserver] validator for --scanner max wait ms

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16655


Change subject: [tserver] validator for --scanner_max_wait_ms
......................................................................

[tserver] validator for --scanner_max_wait_ms

This patch adds a group validator for the --scanner_max_wait_ms vs
--raft_heartbeat_interval_ms flag's value.  As of now, the validator
output warning if --scanner_max_wait_ms is set too low compared
with --raft_heartbeat_interval_ms.  In addition, --scanner_max_wait_ms
is now tagged as 'runtime' to reflect its de facto behavior.

I also did a minor clean in the code around.

I didn't add any test, but I verified that the warning is output upon
kudu-tserver's startup as intended when --scanner_max_wait_ms is set too
low compared with current setting for --raft_heartbeat_interval_ms.

Change-Id: I1dec4173a9ae50a4de34b909283c5a2ee4ef9166
---
M src/kudu/consensus/time_manager.cc
M src/kudu/tserver/tablet_service.cc
2 files changed, 64 insertions(+), 27 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/16655/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1dec4173a9ae50a4de34b909283c5a2ee4ef9166
Gerrit-Change-Number: 16655
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [tserver] validator for --scanner max wait ms

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16655

to look at the new patch set (#2).

Change subject: [tserver] validator for --scanner_max_wait_ms
......................................................................

[tserver] validator for --scanner_max_wait_ms

This patch adds a group validator for the --scanner_max_wait_ms vs
--raft_heartbeat_interval_ms flag's value.  As of now, the validator
outputs warning if --scanner_max_wait_ms is set too low compared
with --raft_heartbeat_interval_ms.  In addition, --scanner_max_wait_ms
is now tagged as 'runtime' to reflect its de facto behavior.

I also did a minor clean-up of the related code.

I didn't add any test, but I verified that the warning is output upon
kudu-tserver's startup as intended when --scanner_max_wait_ms is set too
low compared with current setting for --raft_heartbeat_interval_ms.

Change-Id: I1dec4173a9ae50a4de34b909283c5a2ee4ef9166
---
M src/kudu/consensus/time_manager.cc
M src/kudu/tserver/tablet_service.cc
2 files changed, 64 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/16655/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1dec4173a9ae50a4de34b909283c5a2ee4ef9166
Gerrit-Change-Number: 16655
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [tserver] validator for --scanner max wait ms

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16655 )

Change subject: [tserver] validator for --scanner_max_wait_ms
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16655/2/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16655/2/src/kudu/tserver/tablet_service.cc@210
PS2, Line 210:   return true;
> Some background behind this: I was looking at one issue from the fields and
I see. Thanks for clarifying!



-- 
To view, visit http://gerrit.cloudera.org:8080/16655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dec4173a9ae50a4de34b909283c5a2ee4ef9166
Gerrit-Change-Number: 16655
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 28 Oct 2020 05:29:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tserver] validator for --scanner max wait ms

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16655 )

Change subject: [tserver] validator for --scanner_max_wait_ms
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16655/2/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16655/2/src/kudu/tserver/tablet_service.cc@204
PS2, Line 204: at least up to $2
> nit: "to at least $2", otherwise this may read as though the user should in
Done


http://gerrit.cloudera.org:8080/#/c/16655/2/src/kudu/tserver/tablet_service.cc@210
PS2, Line 210:   return true;
> Is this to say that it's not actually that critical an issue? Will snapshot
Some background behind this: I was looking at one issue from the fields and I initially thought that the problem was related to a difference in local time between tablet servers.   I had a theory about distribution of tablet replicas when the client was writing and reading back from a follower replica with a lagging local clock.

However, after deeper investigation I realized that was not likely the case.  I found that the issue is most likely related to a fact that the follower replica which was a source for a timed out snapshot scan had accumulated many operations because and was slow to apply those.

With these findings, I guess trying to force this  relationship between --scanner_max_wait_ms and --raft_heartbeat_interval_ms doesn't make much sense unless we think it's common to have few seconds difference in local clock among different tablet servers.  I guess the latter is very unlikely, and it should rather be considered as an anomaly.

With that, I don't think we want to introduce this validator, actually.  So, I moved the rest of the changes into a separate changelist and posted it for review: https://gerrit.cloudera.org/#/c/16669/

Meanwhile, I'm abandoning this changelist.



-- 
To view, visit http://gerrit.cloudera.org:8080/16655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dec4173a9ae50a4de34b909283c5a2ee4ef9166
Gerrit-Change-Number: 16655
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 28 Oct 2020 03:52:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tserver] validator for --scanner max wait ms

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [tserver] validator for --scanner_max_wait_ms
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/16655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I1dec4173a9ae50a4de34b909283c5a2ee4ef9166
Gerrit-Change-Number: 16655
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [tserver] validator for --scanner max wait ms

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has abandoned this change. ( http://gerrit.cloudera.org:8080/16655 )

Change subject: [tserver] validator for --scanner_max_wait_ms
......................................................................


Abandoned
-- 
To view, visit http://gerrit.cloudera.org:8080/16655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I1dec4173a9ae50a4de34b909283c5a2ee4ef9166
Gerrit-Change-Number: 16655
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [tserver] validator for --scanner max wait ms

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16655 )

Change subject: [tserver] validator for --scanner_max_wait_ms
......................................................................


Patch Set 2: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16655/2/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16655/2/src/kudu/tserver/tablet_service.cc@204
PS2, Line 204: at least up to $2
nit: "to at least $2", otherwise this may read as though the user should increase by an additional heartbeat interval.


http://gerrit.cloudera.org:8080/#/c/16655/2/src/kudu/tserver/tablet_service.cc@210
PS2, Line 210:   return true;
Is this to say that it's not actually that critical an issue? Will snapshot scans with the latest timestamp _mostly_ pass even if improperly set? My concern is that a warning is indeed not severe enough to spur action on the operator, though I do understand why a soft validation is desirable, assuming the service runs ok in a lot of cases even if improperly set.



-- 
To view, visit http://gerrit.cloudera.org:8080/16655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dec4173a9ae50a4de34b909283c5a2ee4ef9166
Gerrit-Change-Number: 16655
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 27 Oct 2020 06:29:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tserver] validator for --scanner max wait ms

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16655 )

Change subject: [tserver] validator for --scanner_max_wait_ms
......................................................................


Patch Set 2: Verified+1

unrelated test failure in ToolTest.TestHmsList


-- 
To view, visit http://gerrit.cloudera.org:8080/16655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dec4173a9ae50a4de34b909283c5a2ee4ef9166
Gerrit-Change-Number: 16655
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 27 Oct 2020 06:20:47 +0000
Gerrit-HasComments: No