You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/12/13 14:50:19 UTC

[kudu-CR] WIP: KUDU-699

Todd Lipcon has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5490

Change subject: WIP: KUDU-699
......................................................................

WIP: KUDU-699

Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/integration-tests/raft_consensus-itest.cc
6 files changed, 183 insertions(+), 123 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/5490/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5490

to look at the new patch set (#2).

Change subject: KUDU-699. consensus: Peer::Close() should not block on outstanding requests
......................................................................

KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Prior to this patch, Peer::Close() would wait on a semaphore which was
held by outbound RPC requests. This meant that, in the case of the
default 1 second consensus RPC timeout, Close() would commonly block for
a fairly long time.

This blocking wasn't necessary for semantic correctness: even if the
response came back with some kind of successful result, the result would
be ignored because the leader was already in the process of stepping
down. Instead, the blocking was an implementation detail: the
outstanding RPC requests needed to keep alive the RPC request/response
protobufs until they finished, and those protobufs are owned by the Peer
object.

In addition to being unnecessary, the blocking actually causes a couple
serious issues:

1) in an overloaded cluster, we often see a lot of leader election churn
and slow UpdateConsensus() calls. When UpdateConsensus() calls are slow,
this would cause leader step-down to also be slow because of the
PeerManager::Close() call taking a long time. This slow step-down
process would hold an RPC thread, increasing the possibility of other
RPCs getting rejected, retried, etc, contributing to the overall
problems on the cluster. This is often visible in 'pstack' as 5-10
threads stuck in 'PeerManager::Close()'

2) KUDU-1788 describes an issue in which the short timeout we're
currently using for consensus RPCs ends up resulting in those RPCs never
succeeding, and wasting a lot of network bandwidth with repeated
retries. Part of the solution to this issue is likely to involve
boosting the timeout. With a longer RPC timeout, the blocking behavior
on Close() described above is even more problematic.

This patch fixes the issue as follows:

1) Peers now have shared ownership and inherit from
std::enable_shared_from_this. When we make an RPC, we bind a shared_ptr
to the Peer in the RPC's callback. This ensures that the Peer will not
be destructed while an RPC is still in-flight.

2) We no longer need to use the 'sem_' variable to track whether an RPC
is in flight. The purpose of this semaphore was two-fold: (a) to cause
Close() to block, and (b) to prevent a second RPC from being sent while
one was already in flight. The former purpose is no longer a goal. The
latter purpose is attained just as easily using a simple boolean. So,
this patch removes the semaphore and instead just uses a
'request_pending_' boolean.

3) While making these changes, I removed the 'state_' member and
enum. The state was used to flag that Close() had been called, and to
flag whether the first request had been sent yet. I replaced the state
with two booleans, which I found simpler to reason about.

A new test is included which sets the consensus RPC timeout to be long,
stops, a follower, and then asks the leader to step down. Prior to this
patch, the step-down would take as long as the consensus RPC timeout and
cause the test to fail. With this patch, the step-down occurs immediately.

Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/integration-tests/raft_consensus-itest.cc
6 files changed, 175 insertions(+), 124 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/5490/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-699. consensus: Peer::Close() should not block on outstanding requests
......................................................................


Patch Set 3:

(1 comment)

OK, I looped the raft itest 1000 times in ASAN and consensus_peers-test 500 times in TSAN and the only failures are due to other unrelated known issues. So, I think this is good to go. Will rebase and submit (with Mike's nit fixed)

http://gerrit.cloudera.org:8080/#/c/5490/3/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

Line 245:     lock.unlock();
> nit: no need to explicitly unlock before return
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-699. consensus: Peer::Close() should not block on outstanding requests
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

LGTM, one minor nit

http://gerrit.cloudera.org:8080/#/c/5490/3/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

Line 245:     lock.unlock();
nit: no need to explicitly unlock before return


-- 
To view, visit http://gerrit.cloudera.org:8080/5490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-699. consensus: Peer::Close() should not block on outstanding requests
......................................................................


KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Prior to this patch, Peer::Close() would wait on a semaphore which was
held by outbound RPC requests. This meant that, in the case of the
default 1 second consensus RPC timeout, Close() would commonly block for
a fairly long time.

This blocking wasn't necessary for semantic correctness: even if the
response came back with some kind of successful result, the result would
be ignored because the leader was already in the process of stepping
down. Instead, the blocking was an implementation detail: the
outstanding RPC requests needed to keep alive the RPC request/response
protobufs until they finished, and those protobufs are owned by the Peer
object.

In addition to being unnecessary, the blocking actually causes a couple
serious issues:

1) in an overloaded cluster, we often see a lot of leader election churn
and slow UpdateConsensus() calls. When UpdateConsensus() calls are slow,
this would cause leader step-down to also be slow because of the
PeerManager::Close() call taking a long time. This slow step-down
process would hold an RPC thread, increasing the possibility of other
RPCs getting rejected, retried, etc, contributing to the overall
problems on the cluster. This is often visible in 'pstack' as 5-10
threads stuck in 'PeerManager::Close()'

2) KUDU-1788 describes an issue in which the short timeout we're
currently using for consensus RPCs ends up resulting in those RPCs never
succeeding, and wasting a lot of network bandwidth with repeated
retries. Part of the solution to this issue is likely to involve
boosting the timeout. With a longer RPC timeout, the blocking behavior
on Close() described above is even more problematic.

This patch fixes the issue as follows:

1) Peers now have shared ownership and inherit from
std::enable_shared_from_this. When we make an RPC, we bind a shared_ptr
to the Peer in the RPC's callback. This ensures that the Peer will not
be destructed while an RPC is still in-flight.

2) We no longer need to use the 'sem_' variable to track whether an RPC
is in flight. The purpose of this semaphore was two-fold: (a) to cause
Close() to block, and (b) to prevent a second RPC from being sent while
one was already in flight. The former purpose is no longer a goal. The
latter purpose is attained just as easily using a simple boolean. So,
this patch removes the semaphore and instead just uses a
'request_pending_' boolean.

3) While making these changes, I removed the 'state_' member and
enum. The state was used to flag that Close() had been called, and to
flag whether the first request had been sent yet. I replaced the state
with two booleans, which I found simpler to reason about.

A new test is included which sets the consensus RPC timeout to be long,
stops, a follower, and then asks the leader to step down. Prior to this
patch, the step-down would take as long as the consensus RPC timeout and
cause the test to fail. With this patch, the step-down occurs immediately.

Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
Reviewed-on: http://gerrit.cloudera.org:8080/5490
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/integration-tests/raft_consensus-itest.cc
6 files changed, 174 insertions(+), 124 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-699. consensus: Peer::Close() should not block on outstanding requests
......................................................................


Patch Set 2: Verified-1

Looping raft itest in TSAN caught a lock-order inversion with this. Needs another rev, but looks like nothing major. Please take a look.

-- 
To view, visit http://gerrit.cloudera.org:8080/5490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5490

to look at the new patch set (#4).

Change subject: KUDU-699. consensus: Peer::Close() should not block on outstanding requests
......................................................................

KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Prior to this patch, Peer::Close() would wait on a semaphore which was
held by outbound RPC requests. This meant that, in the case of the
default 1 second consensus RPC timeout, Close() would commonly block for
a fairly long time.

This blocking wasn't necessary for semantic correctness: even if the
response came back with some kind of successful result, the result would
be ignored because the leader was already in the process of stepping
down. Instead, the blocking was an implementation detail: the
outstanding RPC requests needed to keep alive the RPC request/response
protobufs until they finished, and those protobufs are owned by the Peer
object.

In addition to being unnecessary, the blocking actually causes a couple
serious issues:

1) in an overloaded cluster, we often see a lot of leader election churn
and slow UpdateConsensus() calls. When UpdateConsensus() calls are slow,
this would cause leader step-down to also be slow because of the
PeerManager::Close() call taking a long time. This slow step-down
process would hold an RPC thread, increasing the possibility of other
RPCs getting rejected, retried, etc, contributing to the overall
problems on the cluster. This is often visible in 'pstack' as 5-10
threads stuck in 'PeerManager::Close()'

2) KUDU-1788 describes an issue in which the short timeout we're
currently using for consensus RPCs ends up resulting in those RPCs never
succeeding, and wasting a lot of network bandwidth with repeated
retries. Part of the solution to this issue is likely to involve
boosting the timeout. With a longer RPC timeout, the blocking behavior
on Close() described above is even more problematic.

This patch fixes the issue as follows:

1) Peers now have shared ownership and inherit from
std::enable_shared_from_this. When we make an RPC, we bind a shared_ptr
to the Peer in the RPC's callback. This ensures that the Peer will not
be destructed while an RPC is still in-flight.

2) We no longer need to use the 'sem_' variable to track whether an RPC
is in flight. The purpose of this semaphore was two-fold: (a) to cause
Close() to block, and (b) to prevent a second RPC from being sent while
one was already in flight. The former purpose is no longer a goal. The
latter purpose is attained just as easily using a simple boolean. So,
this patch removes the semaphore and instead just uses a
'request_pending_' boolean.

3) While making these changes, I removed the 'state_' member and
enum. The state was used to flag that Close() had been called, and to
flag whether the first request had been sent yet. I replaced the state
with two booleans, which I found simpler to reason about.

A new test is included which sets the consensus RPC timeout to be long,
stops, a follower, and then asks the leader to step down. Prior to this
patch, the step-down would take as long as the consensus RPC timeout and
cause the test to fail. With this patch, the step-down occurs immediately.

Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/integration-tests/raft_consensus-itest.cc
6 files changed, 174 insertions(+), 124 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/5490/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-699. consensus: Peer::Close() should not block on outstanding requests
......................................................................


Patch Set 4: Code-Review+2

Carrying +2 after a trivial nit fix and rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/5490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-699. consensus: Peer::Close() should not block on outstanding requests

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-699. consensus: Peer::Close() should not block on outstanding requests
......................................................................


Patch Set 3: Verified-1

Still want to do some more dist-test loops on this one before commit. I saw a couple issues in the latest runs I did, but not sure if they are caused by this or not and need investigation

-- 
To view, visit http://gerrit.cloudera.org:8080/5490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4e1bc80f536defad28f4d7b51fb95aa32dc9fca0
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No