You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org> on 2020/12/22 20:50:18 UTC

[kudu-CR] [consensus] Allow sending status-only request messages to FAILED peer

Bankim Bhavsar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16899


Change subject: [consensus] Allow sending status-only request messages to FAILED peer
......................................................................

[consensus] Allow sending status-only request messages to FAILED peer

This change adds the ability for a leader to send status-only
messages to a peer even if it's in FAILED_UNRECOVERABLE state.
This ability is turned off by default and controlled via
a PeerMessageQueue parameter.

Without this change when the system catalog is copied externally
the new master remains in FAILED_UNRECOVERABLE state and doesn't get
promoted to being a VOTER despite the system catalog being up to date.
The procedure for end-to-end testing that hooks up masters to use
this Raft config is a separate change.

Change-Id: I229cc739c1b5ec7b11ce05d5e6b1b8e9d654d6f7
---
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
5 files changed, 99 insertions(+), 11 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/16899/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I229cc739c1b5ec7b11ce05d5e6b1b8e9d654d6f7
Gerrit-Change-Number: 16899
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>

[kudu-CR] [consensus] Allow sending status-only request messages to FAILED peer

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16899 )

Change subject: [consensus] Allow sending status-only request messages to FAILED peer
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue-test.cc
File src/kudu/consensus/consensus_queue-test.cc:

http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue-test.cc@413
PS1, Line 413:   request.mutable_ops()->ExtractSubrange(0, request.ops_size(), nullptr);
I might be missing something -- what's this for?


http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.h@551
PS1, Line 551: ShouldSendLogMessagesToPeer
nit: this appears to only be used once, so maybe just inline it? That way readers don't need to jump between the def and usage. The details also make a lot more sense closer to the usage -- it's otherwise a bit unclear how callers should use this function.


http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.h@561
PS1, Line 561: (allow_status_msg_for_failed_peer_ == nullptr || !*allow_status_msg_for_failed_peer_ ||
             :             peer.wal_catchup_possible);
nit: up to you, but I found it much easier to reason about this written as

 !(allow_status_msg_for_failed_peer_ && *allow_status_msg_for_peer_ && !peer.wal_catchup_possible)

Alternatively invert the entire thing and have name the function ShouldSendStatusOnlyMessages


http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.cc@711
PS1, Line 711: fault_
nit: add PREDICT_FALSE


http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.cc@713
PS1, Line 713: Lo
nit: mind adding something that makes this obvious the message is an injected failure? Elsewhere we just use "INJECTED FAILURE".



-- 
To view, visit http://gerrit.cloudera.org:8080/16899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229cc739c1b5ec7b11ce05d5e6b1b8e9d654d6f7
Gerrit-Change-Number: 16899
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 04 Jan 2021 07:05:23 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] Allow sending status-only request messages to FAILED peer

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16899 )

Change subject: [consensus] Allow sending status-only request messages to FAILED peer
......................................................................

[consensus] Allow sending status-only request messages to FAILED peer

This change adds the ability for a leader to send status-only
messages to a peer even if it's in FAILED_UNRECOVERABLE state.
This ability is turned off by default and controlled via
a PeerMessageQueue parameter.

Without this change when the system catalog is copied externally
the new master remains in FAILED_UNRECOVERABLE state and doesn't get
promoted to being a VOTER despite the system catalog being up to date.
The procedure for end-to-end testing that hooks up masters to use
this Raft config is a separate change.

Change-Id: I229cc739c1b5ec7b11ce05d5e6b1b8e9d654d6f7
Reviewed-on: http://gerrit.cloudera.org:8080/16899
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
5 files changed, 92 insertions(+), 10 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/16899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I229cc739c1b5ec7b11ce05d5e6b1b8e9d654d6f7
Gerrit-Change-Number: 16899
Gerrit-PatchSet: 3
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] Allow sending status-only request messages to FAILED peer

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16899 )

Change subject: [consensus] Allow sending status-only request messages to FAILED peer
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue-test.cc
File src/kudu/consensus/consensus_queue-test.cc:

http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue-test.cc@413
PS1, Line 413:   request.mutable_ops()->ExtractSubrange(0, request.ops_size(), nullptr);
> I might be missing something -- what's this for?
In this case it's not needed with 0 ops. So I'll remove it.


http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.h@551
PS1, Line 551: ShouldSendLogMessagesToPeer
> nit: this appears to only be used once, so maybe just inline it? That way r
Done


http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.h@561
PS1, Line 561: (allow_status_msg_for_failed_peer_ == nullptr || !*allow_status_msg_for_failed_peer_ ||
             :             peer.wal_catchup_possible);
> nit: up to you, but I found it much easier to reason about this written as
Hmm. I don't find one more readable than the other. I'll keep it as is.


http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.cc@711
PS1, Line 711: fault_
> nit: add PREDICT_FALSE
Done


http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.cc@713
PS1, Line 713: Lo
> nit: mind adding something that makes this obvious the message is an inject
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229cc739c1b5ec7b11ce05d5e6b1b8e9d654d6f7
Gerrit-Change-Number: 16899
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 06 Jan 2021 00:30:35 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] Allow sending status-only request messages to FAILED peer

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16899

to look at the new patch set (#2).

Change subject: [consensus] Allow sending status-only request messages to FAILED peer
......................................................................

[consensus] Allow sending status-only request messages to FAILED peer

This change adds the ability for a leader to send status-only
messages to a peer even if it's in FAILED_UNRECOVERABLE state.
This ability is turned off by default and controlled via
a PeerMessageQueue parameter.

Without this change when the system catalog is copied externally
the new master remains in FAILED_UNRECOVERABLE state and doesn't get
promoted to being a VOTER despite the system catalog being up to date.
The procedure for end-to-end testing that hooks up masters to use
this Raft config is a separate change.

Change-Id: I229cc739c1b5ec7b11ce05d5e6b1b8e9d654d6f7
---
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
5 files changed, 92 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/16899/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I229cc739c1b5ec7b11ce05d5e6b1b8e9d654d6f7
Gerrit-Change-Number: 16899
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] Allow sending status-only request messages to FAILED peer

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16899 )

Change subject: [consensus] Allow sending status-only request messages to FAILED peer
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/16899/1/src/kudu/consensus/consensus_queue.h@561
PS1, Line 561: 
             :   // The id of the tablet.
> Hmm. I don't find one more readable than the other. I'll keep it as is.
Sure, the case for the &&ed version being more readable is that you could easily summarize it as:

 !(allow_status_messages_on_failure && is_failed)

in which case we should send a status message, not a log message. That IMO is more straightforward than reasoning about the ||s. Though I don't feel strongly about it.



-- 
To view, visit http://gerrit.cloudera.org:8080/16899
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I229cc739c1b5ec7b11ce05d5e6b1b8e9d654d6f7
Gerrit-Change-Number: 16899
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 06 Jan 2021 01:17:22 +0000
Gerrit-HasComments: Yes