You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2017/06/29 00:55:49 UTC

[kudu-CR] kserver: consolidate randomized failure monitors

Hello David Ribeiro Alves, Mike Percy, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/7330

to review the following change.

Change subject: kserver: consolidate randomized failure monitors
......................................................................

kserver: consolidate randomized failure monitors

The failure detector design (i.e. the split between FailureDetector and
RandomizedFailureMonitor) makes it quite simple to consolidate the threads
used by failure detectors: simply share a single RandomizedFailureMonitor.
Most of this patch is plumbing to enable that.

What's actually interesting?
- When failure detection is activated, we create a cycle between
  RaftConsensus and FailureDetector in order to protect against failure
  callbacks triggering after RaftConsensus is destroyed. The cycle is broken
  during Shutdown. If/when RaftConsensus is changed to use a standard
  shared_ptr, we can bind with a weak_ptr and avoid the cycle.
- kserver now depends on consensus; this means it can use gflags defined in
  consensus. Accordingly, the raft pool's custom idle time has been restored
  (it was removed in commit 3fb84a7).

Change-Id: I096e3e89bf6e8925f6ea0382fc319d7382237787
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/kserver/CMakeLists.txt
M src/kudu/kserver/kserver.cc
M src/kudu/kserver/kserver.h
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/ts_tablet_manager.cc
12 files changed, 112 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/7330/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I096e3e89bf6e8925f6ea0382fc319d7382237787
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] kserver: consolidate randomized failure monitors

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: kserver: consolidate randomized failure monitors
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7330/1/src/kudu/kserver/kserver.cc
File src/kudu/kserver/kserver.cc:

Line 92:       failure_monitor_(GetRandomSeed32(),
> we used to have a different seed for each tablet's FailureMonitor, and now 
Agreed. One thing we could consider is scheduling failure callbacks with some random jitter after we have determined they have failed, maybe with one additional last-second check that they didn't recover during that small window.


-- 
To view, visit http://gerrit.cloudera.org:8080/7330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I096e3e89bf6e8925f6ea0382fc319d7382237787
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] kserver: consolidate randomized failure monitors

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: kserver: consolidate randomized failure monitors
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7330/1/src/kudu/kserver/kserver.cc
File src/kudu/kserver/kserver.cc:

Line 92:       failure_monitor_(GetRandomSeed32(),
we used to have a different seed for each tablet's FailureMonitor, and now we have a single one across all tablets.

Does this reduce jitter between the time at which tablets detect failures, causing potentially worse "floods" of detected failures when a process pauses or otherwise gets disconnected from peers? The jitter is an important feature that I think we need to retain.


-- 
To view, visit http://gerrit.cloudera.org:8080/7330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I096e3e89bf6e8925f6ea0382fc319d7382237787
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] kserver: consolidate randomized failure monitors

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has abandoned this change.

Change subject: kserver: consolidate randomized failure monitors
......................................................................


Abandoned

I went in a different direction and reimplemented failure detection using timer-based scheduling: https://gerrit.cloudera.org/#/c/7735/

-- 
To view, visit http://gerrit.cloudera.org:8080/7330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I096e3e89bf6e8925f6ea0382fc319d7382237787
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>