You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2017/09/20 01:55:10 UTC

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Hello Mike Percy, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/8107

to review the following change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................

KUDU-2149: avoid election stacking by restoring failure monitor semantics

Prior to commit 21b0f3d, the dedicated failure monitor thread invoked
RaftConsensus::StartElection() synchronously, thus preventing it from
surfacing additional failures during that time. This patch attempts to
restore these semantics by short-circuiting and ignoring any failures
detected while a Raft thread is in StartElection().

I spent some time trying to test this and couldn't come up with anything
reliable, even after adding latency injection to ConsensusMetadata::Flush().

This is a super targeted fix geared towards a point release; a more correct
fix would be to completely disable failure detection while an election is
running, but that'll require more work.

Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 22 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/8107/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8107/4/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS4, Line 788: be
> may be
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/consensus/consensus_meta.cc
File src/kudu/consensus/consensus_meta.cc:

Line 276:   if (PREDICT_FALSE(FLAGS_cmeta_inject_latency_on_flush_ms > 0)) {
you may need to ANNOTATE_UNPROTECTED_READ this here and below


http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS3, Line 125: be allowed at a time
it doesn't actually serialize the whole election, right? it just serializes the _starting_ of elections? I think its' worth noting somewhere that you are mostly trying to avoid the case where the consensus lock is held and thus even the _starting_ of the election is quite slow


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/consensus/consensus_meta.cc
File src/kudu/consensus/consensus_meta.cc:

Line 276:   if (PREDICT_FALSE(FLAGS_cmeta_inject_latency_on_flush_ms > 0)) {
> you may need to ANNOTATE_UNPROTECTED_READ this here and below
Sure. That's because this flag can be modified at runtime?


http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS3, Line 125: be allowed at a time
> it doesn't actually serialize the whole election, right? it just serializes
Sure, I'll reword this.

As a side note, one of the things I dislike about how we treat test-only hidden gflags is that for some reason we still bother to provide descriptions for them, even though they're totally unnecessary.


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


KUDU-2149: avoid election stacking by restoring failure monitor semantics

Prior to commit 21b0f3d, the dedicated failure monitor thread invoked
RaftConsensus::StartElection() synchronously, thus preventing it from
surfacing additional failures during that time. This patch attempts to
restore these semantics by short-circuiting and ignoring any failures
detected while a Raft thread is in StartElection().

This is a super targeted fix geared towards a point release; a more correct
fix would be to completely disable failure detection while an election is
running, but that'll require more work.

Originally I had written a test that injects latency into
ConsensusMetadata::Flush(), toggles the fix, and compares the number of vote
request RPCs. I couldn't get it to be totally robust, and the "feature flag"
used in the toggle is likely to become obselete quickly. So in the end I
decided to drop the test from the patch.

Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Reviewed-on: http://gerrit.cloudera.org:8080/8107
Reviewed-by: Mike Percy <mp...@apache.org>
Tested-by: Kudu Jenkins
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 23 insertions(+), 3 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS3, Line 786: Note: the lock is only ever acquired via try_lock(); if it cannot be
             :   // acquired, an election must be in progress so the next one is skipped.
> I'm not sure I understand this.  As I could see, LeaderElection::Run() requ
Correct. The lock is not held for the entirety of the election; just for the start of it. So if you try to acquire it and fail, it's a sign that an election is _starting_ (which is a subset of "an election is _running_").

However, the converse ("successfully acquiring the lock means there's no election in progress") is not true.

Please see the updated wording I used in PS4 and let me know if it's still unclear.


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8107

to look at the new patch set (#3).

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................

KUDU-2149: avoid election stacking by restoring failure monitor semantics

Prior to commit 21b0f3d, the dedicated failure monitor thread invoked
RaftConsensus::StartElection() synchronously, thus preventing it from
surfacing additional failures during that time. This patch attempts to
restore these semantics by short-circuiting and ignoring any failures
detected while a Raft thread is in StartElection().

This is a super targeted fix geared towards a point release; a more correct
fix would be to completely disable failure detection while an election is
running, but that'll require more work.

The included test is pretty ugly, especially the "feature flag" part. It
failed 1 of 1000 runs in DEBUG mode with the number of votes being equal, so
it's probably not robust enough to be merged as is.

Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
4 files changed, 148 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/8107/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/consensus/consensus_meta.cc
File src/kudu/consensus/consensus_meta.cc:

Line 276:   if (PREDICT_FALSE(FLAGS_cmeta_inject_latency_on_flush_ms > 0)) {
> Sure. That's because this flag can be modified at runtime?
yea, in other tests i saw occasional TSAN races flagged by runtime-changed flags. they're "harmless"


http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS3, Line 125: be allowed at a time
> Sure, I'll reword this.
that's true. I would be on-board with changing them all to "test-only" and assume that anyone who cares would just grep for usage sites


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8107/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 3244:     // Drive up the number of elections by reducing the failure period.
If it makes sense for this scenario, consider adding --raft_enable_pre_election=false to introduce more volatility into the process.


Line 3245:     "--raft_heartbeat_interval_ms=100"
Would it make sense to decrease the default --leader_failure_max_missed_heartbeat_periods as well (say, down to 1.1 or even to 1.0)?


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS3, Line 786: Note: the lock is only ever acquired via try_lock(); if it cannot be
             :   // acquired, an election must be in progress so the next one is skipped.
I'm not sure I understand this.  As I could see, LeaderElection::Run() requests consensus votes asynchronously, so even if RaftConsensus::StartElection() returns, that doesn't mean election is complete.

Could you please clarify on this?


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS3, Line 786: 
             :   // Note: the lock is only ever acquired via try_lock(); if it cannot be
> Yes. I think the cleanest way to do that is to modify the failure detector 
OK, I see.  Thank you for the clarification!


http://gerrit.cloudera.org:8080/#/c/8107/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 3244:     // Drive up the number of elections by reducing the failure period.
> Disabling pre-elections also cuts the number of elections (and thus votes) 
SGTM.


Line 3245:     "--raft_heartbeat_interval_ms=100"
> I don't think it's strictly necessary. 300 ms to detect a failure is reason
Yep, anyway 300 < 5000, so it's good enough, I agree.


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8107/4/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS4, Line 788: be
may be


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS3, Line 786: Note: the lock is only ever acquired via try_lock(); if it cannot be
             :   // acquired, an election must be in progress so the next one is skipped.
> Correct. The lock is not held for the entirety of the election; just for th
Yes, the new version of updated comment in PS4 is clearer, thanks.

Just another question: do I understand correctly (at least, that's the impression I got from the commit message.) that a proper fix would be to avoid starting an election if the previous one hasn't finished or timed out yet?


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Todd Lipcon, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8107

to look at the new patch set (#7).

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................

KUDU-2149: avoid election stacking by restoring failure monitor semantics

Prior to commit 21b0f3d, the dedicated failure monitor thread invoked
RaftConsensus::StartElection() synchronously, thus preventing it from
surfacing additional failures during that time. This patch attempts to
restore these semantics by short-circuiting and ignoring any failures
detected while a Raft thread is in StartElection().

This is a super targeted fix geared towards a point release; a more correct
fix would be to completely disable failure detection while an election is
running, but that'll require more work.

Originally I had written a test that injects latency into
ConsensusMetadata::Flush(), toggles the fix, and compares the number of vote
request RPCs. I couldn't get it to be totally robust, and the "feature flag"
used in the toggle is likely to become obselete quickly. So in the end I
decided to drop the test from the patch.

Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 23 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/8107/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Todd Lipcon, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8107

to look at the new patch set (#5).

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................

KUDU-2149: avoid election stacking by restoring failure monitor semantics

Prior to commit 21b0f3d, the dedicated failure monitor thread invoked
RaftConsensus::StartElection() synchronously, thus preventing it from
surfacing additional failures during that time. This patch attempts to
restore these semantics by short-circuiting and ignoring any failures
detected while a Raft thread is in StartElection().

This is a super targeted fix geared towards a point release; a more correct
fix would be to completely disable failure detection while an election is
running, but that'll require more work.

Originally I had written a test that injects latency into
ConsensusMetadata::Flush(), toggles the fix, and compares the number of vote
request RPCs. I couldn't get it to be totally robust, and the "feature flag"
used in the toggle is likely to become obselete quickly. So in the end I
decided to drop the test from the patch.

Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 23 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/8107/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS3, Line 3257: ASSERT_OK(FindTabletLeader(tablet_servers_, tablet_id, kTimeout, &leader_tsd));
Why not to move this under the loop scope so it would be the first operation of it?  Doing so, it might be possible to get rid of the later of FindTabletLeader().

Also, if not doing so, how to make sure that the actual leader TS hasn't changed on the second iteration after the tserver with the initial leader was restarted?


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS3, Line 3257: ASSERT_OK(FindTabletLeader(tablet_servers_, tablet_id, kTimeout, &leader_tsd));
> I can move it into the start of the loop to replace the call at the end of 
Oh, I see.  It was just my misunderstanding.  Thank you for the clarification.


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3: Verified+1

Overriding Jenkins, the two failures were in starting a TSAN test and an unrelated flake.

-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Todd Lipcon, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8107

to look at the new patch set (#6).

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................

KUDU-2149: avoid election stacking by restoring failure monitor semantics

Prior to commit 21b0f3d, the dedicated failure monitor thread invoked
RaftConsensus::StartElection() synchronously, thus preventing it from
surfacing additional failures during that time. This patch attempts to
restore these semantics by short-circuiting and ignoring any failures
detected while a Raft thread is in StartElection().

This is a super targeted fix geared towards a point release; a more correct
fix would be to completely disable failure detection while an election is
running, but that'll require more work.

Originally I had written a test that injects latency into
ConsensusMetadata::Flush(), toggles the fix, and compares the number of vote
request RPCs. I couldn't get it to be totally robust, and the "feature flag"
used in the toggle is likely to become obselete quickly. So in the end I
decided to drop the test from the patch.

Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 23 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/8107/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has uploaded a new patch set (#4).

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................

KUDU-2149: avoid election stacking by restoring failure monitor semantics

Prior to commit 21b0f3d, the dedicated failure monitor thread invoked
RaftConsensus::StartElection() synchronously, thus preventing it from
surfacing additional failures during that time. This patch attempts to
restore these semantics by short-circuiting and ignoring any failures
detected while a Raft thread is in StartElection().

This is a super targeted fix geared towards a point release; a more correct
fix would be to completely disable failure detection while an election is
running, but that'll require more work.

The included test is pretty ugly, especially the "feature flag" part. It
failed 1 of 1000 runs in DEBUG mode with the number of votes being equal, so
it's probably not robust enough to be merged as is.

Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
4 files changed, 152 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/8107/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8107/6/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS6, Line 790:   // TODO(adar): should be replaced with explicit disabling/enabling of the
             :   // failure detector during elections.
should this be a jira instead?


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8107

to look at the new patch set (#2).

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................

KUDU-2149: avoid election stacking by restoring failure monitor semantics

Prior to commit 21b0f3d, the dedicated failure monitor thread invoked
RaftConsensus::StartElection() synchronously, thus preventing it from
surfacing additional failures during that time. This patch attempts to
restore these semantics by short-circuiting and ignoring any failures
detected while a Raft thread is in StartElection().

This is a super targeted fix geared towards a point release; a more correct
fix would be to completely disable failure detection while an election is
running, but that'll require more work.

The included test is pretty ugly, especially the "feature flag" part. It
failed 1 of 1000 runs in DEBUG mode with the number of votes being equal, so
it's probably not robust enough to be merged as is.

Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
4 files changed, 147 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/8107/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS3, Line 786: Note: the lock is only ever acquired via try_lock(); if it cannot be
             :   // acquired, an election must be in progress so the next one is skipped.
> Yes, the new version of updated comment in PS4 is clearer, thanks.
Yes. I think the cleanest way to do that is to modify the failure detector to be "one-shot", reenabling it when an election finishes. But I'm not sure of that approach yet because failure detection is only one of several paths into StartElection(), so I went with something simpler here.


http://gerrit.cloudera.org:8080/#/c/8107/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 3244:     // Drive up the number of elections by reducing the failure period.
> If it makes sense for this scenario, consider adding --raft_enable_pre_elec
Disabling pre-elections also cuts the number of elections (and thus votes) down by more than half, so I'd prefer to keep them enabled.


Line 3245:     "--raft_heartbeat_interval_ms=100"
> Would it make sense to decrease the default --leader_failure_max_missed_hea
I don't think it's strictly necessary. 300 ms to detect a failure is reasonable, and I'd like to avoid unnecessary volatility from due to slow/missed heartbeats.


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8107/3/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS3, Line 3257: ASSERT_OK(FindTabletLeader(tablet_servers_, tablet_id, kTimeout, &leader_tsd));
> Why not to move this under the loop scope so it would be the first operatio
I can move it into the start of the loop to replace the call at the end of the loop, but then I'll still need to add a call after the loop ends, in order to measure the vote count after the second test. That's because FindTabletLeader() isn't just used to find the leader, but also to wait for a leader to be voted into existence.

I don't follow your second point. The FindTabletLeader() call at the end of the loop writes to leader_tsd, so when we loop, leader_ts will be updated and should reflect the leader post-restart, no?


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8107/6/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS6, Line 790:   // TODO(adar): should be replaced with explicit disabling/enabling of the
             :   // failure detector during elections.
> should this be a jira instead?
I suppose so. I filed KUDU-2155.


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8107/2/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 97: METRIC_DECLARE_entity(tablet);
> warning: redundant 'FLAGS_consensus_rpc_timeout_ms' declaration [readabilit
Done


Line 99: METRIC_DECLARE_gauge_int64(raft_term);
> warning: redundant 'METRIC_ENTITY_server' declaration [readability-redundan
Done


Line 3272:         ASSERT_OK(cluster_->SetFlag(reconfigured_ts,
> warning: do not use 'else' after 'continue' [readability-else-after-return]
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2149: avoid election stacking by restoring failure monitor semantics

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No