You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2019/12/27 06:31:14 UTC

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14954


Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................

[consensus] short-circuit response path for RequestVote()

I saw cases of contention on replica's RaftConsensus lock when the
filesystem was slow on updating Raft metadata files after receiving
Raft update message from a leader replica, while there was a vote
request from another follower replica waiting on the RaftConsensus
object's lock.  The latter request would be simply rejected after just
after acquiring the lock because of the recent updates on the voting
withhold interval.

This patch updates the code by moving the last-heard-from-leader check
into the very beginning of the method, so it's possible to respond NO
to a vote request without acquiring the lock in case if receiving Raft
heartbeats from the leader replica just recently.  It should help a bit
with the overflow of the RaftConsensus RPC queue during election storms.

Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 68 insertions(+), 61 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14954/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14954

to look at the new patch set (#5).

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................

[consensus] short-circuit response path for RequestVote()

I saw cases of contention on replica's RaftConsensus lock when the
filesystem was slow on updating Raft metadata files after receiving
Raft update message from a leader replica, while there was a vote
request from another follower replica waiting on the RaftConsensus
object's lock.  The latter request would be simply rejected after just
after acquiring the lock because of the recent updates on the voting
withhold interval.

This patch updates the code by moving the last-heard-from-leader check
into the very beginning of the method, so it's possible to respond NO
to a vote request without acquiring the lock in case if receiving Raft
heartbeats from the leader replica just recently.  It should help a bit
with the overflow of the RaftConsensus RPC queue during election storms.

Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/util/monotime.cc
M src/kudu/util/monotime.h
4 files changed, 93 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14954/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14954 )

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................


Patch Set 3: Verified+1

Unrelated test failure: https://issues.apache.org/jira/browse/KUDU-2501


-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 28 Dec 2019 00:17:17 +0000
Gerrit-HasComments: No

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14954 )

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.cc@198
PS3, Line 198:   DCHECK(std::atomic_is_lock_free(&withhold_votes_until_));
> Yeah I think dropping it is fine. Even on other architectures, this will be
Done: I removed the DCHECK().



-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 06 Jan 2020 19:01:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14954 )

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................

[consensus] short-circuit response path for RequestVote()

I saw cases of contention on replica's RaftConsensus lock when the
filesystem was slow on updating Raft metadata files after receiving
Raft update message from a leader replica, while there was a vote
request from another follower replica waiting on the RaftConsensus
object's lock.  The latter request would be simply rejected after just
after acquiring the lock because of the recent updates on the voting
withhold interval.

This patch updates the code by moving the last-heard-from-leader check
into the very beginning of the method, so it's possible to respond NO
to a vote request without acquiring the lock in case if receiving Raft
heartbeats from the leader replica just recently.  It should help a bit
with the overflow of the RaftConsensus RPC queue during election storms.

Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Reviewed-on: http://gerrit.cloudera.org:8080/14954
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/util/monotime.cc
M src/kudu/util/monotime.h
4 files changed, 93 insertions(+), 64 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Adar Dembo: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14954 )

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.h@894
PS3, Line 894: reponsed
responded to


http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.h@921
PS3, Line 921:  this
             :   // field assigned only when 'lock_' is held.
That aspect of the field's use is tangential to its synchronization, right? Meaning, if for whatever reason we stopped holding lock_ when writing to this field, there wouldn't be any inconsistencies, right?

If that's the case, I'd reword the comment to explain that only the std::atomic<> wrapping is necessary for synchronization, and maybe not even mention the lock_ stuff at all.


http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.cc@198
PS3, Line 198:   DCHECK(std::atomic_is_lock_free(&withhold_votes_until_));
Seems like this could be asserted at compile-time instead. Or not at all; isn't it obvious from inspection that accesses to a std::atomic<int64_t> will always be atomic?


http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.cc@1671
PS3, Line 1671:     return RequestVoteRespondLeaderIsAlive(request, response);
Should we recheck this condition after acquiring lock_ too?



-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 29 Dec 2019 18:35:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14954 )

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................


Patch Set 5:

> Unrelated test failure in org.apache.kudu.client.TestAsyncKuduSession

That was an already known issue existing long before this patch: https://issues.apache.org/jira/browse/KUDU-2956


-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 06 Jan 2020 20:03:41 +0000
Gerrit-HasComments: No

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14954 )

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 06 Jan 2020 21:10:58 +0000
Gerrit-HasComments: No

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14954

to look at the new patch set (#4).

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................

[consensus] short-circuit response path for RequestVote()

I saw cases of contention on replica's RaftConsensus lock when the
filesystem was slow on updating Raft metadata files after receiving
Raft update message from a leader replica, while there was a vote
request from another follower replica waiting on the RaftConsensus
object's lock.  The latter request would be simply rejected after just
after acquiring the lock because of the recent updates on the voting
withhold interval.

This patch updates the code by moving the last-heard-from-leader check
into the very beginning of the method, so it's possible to respond NO
to a vote request without acquiring the lock in case if receiving Raft
heartbeats from the leader replica just recently.  It should help a bit
with the overflow of the RaftConsensus RPC queue during election storms.

Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/util/monotime.cc
M src/kudu/util/monotime.h
4 files changed, 94 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14954/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14954 )

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.cc@198
PS3, Line 198:   DCHECK(std::atomic_is_lock_free(&withhold_votes_until_));
> Per this C++ reference, std::atomic<> might be implemented not via lock-fre
Yeah I think dropping it is fine. Even on other architectures, this will be implemented with a lock, so it's not like it'd be unsafe; it'd just be less performant.



-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 30 Dec 2019 18:45:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14954 )

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.h@894
PS3, Line 894: reponsed
> responded to
Done


http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.h@921
PS3, Line 921:  this
             :   // field assigned only when 'lock_' is held.
> That aspect of the field's use is tangential to its synchronization, right?
Done.


http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.cc@198
PS3, Line 198:   DCHECK(std::atomic_is_lock_free(&withhold_votes_until_));
> Seems like this could be asserted at compile-time instead. Or not at all; i
Per this C++ reference, std::atomic<> might be implemented not via lock-free atomic CPU primitives, and only std::atomic_flag is guaranteed to be implemented using those:

  https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free

I wish it would be possible to have a simpler compile-time check instead of this, but such thing is available only starting C++17 (see std::atomic<T>::is_always_lock_free).

Probably, targeting only x86_64 architecture, we can safely drop this.  What do you think?


http://gerrit.cloudera.org:8080/#/c/14954/3/src/kudu/consensus/raft_consensus.cc@1671
PS3, Line 1671:     return RequestVoteRespondLeaderIsAlive(request, response);
> Should we recheck this condition after acquiring lock_ too?
I thought about this, and it seemed to me that it wouldn't make much difference.  The chances that a heartbeat from the leader arrives between checking for the condition here and later while under the lock.

I gave it a second thought, though.  I think it won't hurt to recheck for the condition under the lock again: it costs a little, but it helps to avoid extra elections and disruptive transfers of the leadership.



-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 30 Dec 2019 04:44:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14954

to look at the new patch set (#2).

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................

[consensus] short-circuit response path for RequestVote()

I saw cases of contention on replica's RaftConsensus lock when the
filesystem was slow on updating Raft metadata files after receiving
Raft update message from a leader replica, while there was a vote
request from another follower replica waiting on the RaftConsensus
object's lock.  The latter request would be simply rejected after just
after acquiring the lock because of the recent updates on the voting
withhold interval.

This patch updates the code by moving the last-heard-from-leader check
into the very beginning of the method, so it's possible to respond NO
to a vote request without acquiring the lock in case if receiving Raft
heartbeats from the leader replica just recently.  It should help a bit
with the overflow of the RaftConsensus RPC queue during election storms.

Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/util/monotime.cc
M src/kudu/util/monotime.h
4 files changed, 76 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14954/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14954 )

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................


Patch Set 5: Verified+1

Unrelated test failure in org.apache.kudu.client.TestAsyncKuduSession


-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 06 Jan 2020 19:51:07 +0000
Gerrit-HasComments: No

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] short-circuit response path for RequestVote()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14954

to look at the new patch set (#3).

Change subject: [consensus] short-circuit response path for RequestVote()
......................................................................

[consensus] short-circuit response path for RequestVote()

I saw cases of contention on replica's RaftConsensus lock when the
filesystem was slow on updating Raft metadata files after receiving
Raft update message from a leader replica, while there was a vote
request from another follower replica waiting on the RaftConsensus
object's lock.  The latter request would be simply rejected after just
after acquiring the lock because of the recent updates on the voting
withhold interval.

This patch updates the code by moving the last-heard-from-leader check
into the very beginning of the method, so it's possible to respond NO
to a vote request without acquiring the lock in case if receiving Raft
heartbeats from the leader replica just recently.  It should help a bit
with the overflow of the RaftConsensus RPC queue during election storms.

Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/util/monotime.cc
M src/kudu/util/monotime.h
4 files changed, 89 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14954/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67efef72b74ce243ca060e89fcec6eb11e95e8e8
Gerrit-Change-Number: 14954
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)