You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/12/15 09:33:49 UTC

[kudu-CR] KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5490

to look at the new patch set (#2).

Change subject: KUDU-699. consensus: Peer::Close() should not block on outstanding requests
......................................................................

KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Prior to this patch, Peer::Close() would wait on a semaphore which was
held by outbound RPC requests. This meant that, in the case of the
default 1 second consensus RPC timeout, Close() would commonly block for
a fairly long time.

This blocking wasn't necessary for semantic correctness: even if the
response came back with some kind of successful result, the result would
be ignored because the leader was already in the process of stepping
down. Instead, the blocking was an implementation detail: the
outstanding RPC requests needed to keep alive the RPC request/response
protobufs until they finished, and those protobufs are owned by the Peer
object.

In addition to being unnecessary, the blocking actually causes a couple
serious issues:

1) in an overloaded cluster, we often see a lot of leader election churn
and slow UpdateConsensus() calls. When UpdateConsensus() calls are slow,
this would cause leader step-down to also be slow because of the
PeerManager::Close() call taking a long time. This slow step-down
process would hold an RPC thread, increasing the possibility of other
RPCs getting rejected, retried, etc, contributing to the overall
problems on the cluster. This is often visible in 'pstack' as 5-10
threads stuck in 'PeerManager::Close()'

2) KUDU-1788 describes an issue in which the short timeout we're
currently using for consensus RPCs ends up resulting in those RPCs never
succeeding, and wasting a lot of network bandwidth with repeated
retries. Part of the solution to this issue is likely to involve
boosting the timeout. With a longer RPC timeout, the blocking behavior
on Close() described above is even more problematic.

This patch fixes the issue as follows:

1) Peers now have shared ownership and inherit from
std::enable_shared_from_this. When we make an RPC, we bind a shared_ptr
to the Peer in the RPC's callback. This ensures that the Peer will not
be destructed while an RPC is still in-flight.

2) We no longer need to use the 'sem_' variable to track whether an RPC
is in flight. The purpose of this semaphore was two-fold: (a) to cause
Close() to block, and (b) to prevent a second RPC from being sent while
one was already in flight. The former purpose is no longer a goal. The
latter purpose is attained just as easily using a simple boolean. So,
this patch removes the semaphore and instead just uses a
'request_pending_' boolean.

3) While making these changes, I removed the 'state_' member and
enum. The state was used to flag that Close() had been called, and to
flag whether the first request had been sent yet. I replaced the state
with two booleans, which I found simpler to reason about.

A new test is included which sets the consensus RPC timeout to be long,
stops, a follower, and then asks the leader to step down. Prior to this
patch, the step-down would take as long as the consensus RPC timeout and
cause the test to fail. With this patch, the step-down occurs immediately.

Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/integration-tests/raft_consensus-itest.cc
6 files changed, 175 insertions(+), 124 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/5490/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot