You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2020/06/02 08:14:17 UTC
[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16021
Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................
[consensus] small cleanup on Peer::SignalRequest()
It's possible not to hold 'peer_lock_' while scheduling calling
SendNextRequest() on the thread pool: if 'request_pending_' or 'closed_'
changes, it's handled by Peer::SendNextRequest() as necessary. It might
be an extra wake up of a thread in Raft pool in worst the case, but it
seems better than blocking the thread running PrepareTask(). Blocking
the latter involves blocking other threads on corresponding locks
and slowing the replication of a Raft operation, where the latter leads
to lower throughput for write operations.
The motivation for this change was seeing the following in the
logs during intensive ingest workload:
0531 13:09:40.403224 (+515724us) spinlock_profiling.cc:244] Waited 514 ms on lock 0xa1581150
with the stack symbolized like below:
kudu::(anonymous namespace)::SubmitSpinLockProfileData(void const*, long)
src/kudu/util/spinlock_profiling.cc:230
kudu::consensus::Peer::SignalRequest(bool)
src/kudu/gutil/spinlock.h:106
kudu::consensus::PeerManager::SignalRequest(bool)
src/kudu/consensus/peer_manager.cc:98
kudu::consensus::RaftConsensus::Replicate(scoped_refptr<kudu::consensus::ConsensusRound> const&)
src/kudu/consensus/raft_consensus.cc:740 (discriminator 2)
kudu::tablet::TransactionDriver::Prepare()
src/kudu/tablet/transactions/transaction_driver.cc:333
kudu::tablet::TransactionDriver::PrepareTask()
src/kudu/tablet/transactions/transaction_driver.cc:243
kudu::ThreadPool::DispatchThread()
src/kudu/util/threadpool.cc:686
I also did other minor cleanup on the code around.
Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_tracker.cc
6 files changed, 36 insertions(+), 37 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/16021/1
--
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16021 )
Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................
[consensus] small cleanup on Peer::SignalRequest()
It's possible not to hold 'peer_lock_' when scheduling SendNextRequest()
on the thread pool: if 'request_pending_' or 'closed_' changes, it's
handled by Peer::SendNextRequest() as necessary. It might be an extra
run a thread in Raft pool in the worst case, but it seems better than
blocking the thread running PrepareTask(). Blocking the latter involves
blocking other tasks on corresponding locks and slowing the replication
of a Raft operation, which leads to lower write throughput.
The motivation for this change was seeing the following in the logs
during intensive ingest workloads:
0531 13:09:40.403224 (+515724us) spinlock_profiling.cc:244] Waited 514 ms on lock 0xa1581150
with the stack symbolized like below:
kudu::(anonymous namespace)::SubmitSpinLockProfileData(void const*, long)
src/kudu/util/spinlock_profiling.cc:230
kudu::consensus::Peer::SignalRequest(bool)
src/kudu/gutil/spinlock.h:106
kudu::consensus::PeerManager::SignalRequest(bool)
src/kudu/consensus/peer_manager.cc:98
kudu::consensus::RaftConsensus::Replicate(scoped_refptr<kudu::consensus::ConsensusRound> const&)
src/kudu/consensus/raft_consensus.cc:740 (discriminator 2)
kudu::tablet::TransactionDriver::Prepare()
src/kudu/tablet/transactions/transaction_driver.cc:333
kudu::tablet::TransactionDriver::PrepareTask()
src/kudu/tablet/transactions/transaction_driver.cc:243
kudu::ThreadPool::DispatchThread()
src/kudu/util/threadpool.cc:686
The stack trace pointed to 'peer_lock_' in Peer::SignalRequest(),
consensus_peer.cc,167 as in one of its previous revisions, as the
rest of the stack.
I also did other minor cleanup on the code around.
Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Reviewed-on: http://gerrit.cloudera.org:8080/16021
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_tracker.cc
6 files changed, 55 insertions(+), 46 deletions(-)
Approvals:
Kudu Jenkins: Verified
Andrew Wong: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()
Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16021 )
Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................
Patch Set 2: Code-Review+2
You mentioned this to me on Slack:
"i also ran several micro-benches, they either showed 4-5% improvement or at least no downside"
It'd be nice to incorporate this in the commit message, or even better, add a small benchmark tracking the spinlock contention time in a low-heartbeat interval, write-heavy workload or somesuch.
--
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 03 Jun 2020 22:39:44 +0000
Gerrit-HasComments: No
[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()
Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16021 )
Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................
Patch Set 1: Code-Review+1
--
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Jun 2020 22:12:53 +0000
Gerrit-HasComments: No
[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Bankim Bhavsar,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16021
to look at the new patch set (#2).
Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................
[consensus] small cleanup on Peer::SignalRequest()
It's possible not to hold 'peer_lock_' when scheduling SendNextRequest()
on the thread pool: if 'request_pending_' or 'closed_' changes, it's
handled by Peer::SendNextRequest() as necessary. It might be an extra
run a thread in Raft pool in the worst case, but it seems better than
blocking the thread running PrepareTask(). Blocking the latter involves
blocking other tasks on corresponding locks and slowing the replication
of a Raft operation, which leads to lower write throughput.
The motivation for this change was seeing the following in the logs
during intensive ingest workloads:
0531 13:09:40.403224 (+515724us) spinlock_profiling.cc:244] Waited 514 ms on lock 0xa1581150
with the stack symbolized like below:
kudu::(anonymous namespace)::SubmitSpinLockProfileData(void const*, long)
src/kudu/util/spinlock_profiling.cc:230
kudu::consensus::Peer::SignalRequest(bool)
src/kudu/gutil/spinlock.h:106
kudu::consensus::PeerManager::SignalRequest(bool)
src/kudu/consensus/peer_manager.cc:98
kudu::consensus::RaftConsensus::Replicate(scoped_refptr<kudu::consensus::ConsensusRound> const&)
src/kudu/consensus/raft_consensus.cc:740 (discriminator 2)
kudu::tablet::TransactionDriver::Prepare()
src/kudu/tablet/transactions/transaction_driver.cc:333
kudu::tablet::TransactionDriver::PrepareTask()
src/kudu/tablet/transactions/transaction_driver.cc:243
kudu::ThreadPool::DispatchThread()
src/kudu/util/threadpool.cc:686
The stack trace pointed to 'peer_lock_' in Peer::SignalRequest(),
consensus_peer.cc,167 as in one of its previous revisions, as the
rest of the stack.
I also did other minor cleanup on the code around.
Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_tracker.cc
6 files changed, 55 insertions(+), 46 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/16021/2
--
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()
Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16021 )
Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................
Patch Set 1:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG@12
PS1, Line 12: worst the ca
nit: the worst case
http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG@14
PS1, Line 14: threads
nit: blocking other tasks?
http://gerrit.cloudera.org:8080/#/c/16021/1/src/kudu/consensus/consensus_peers.h
File src/kudu/consensus/consensus_peers.h:
http://gerrit.cloudera.org:8080/#/c/16021/1/src/kudu/consensus/consensus_peers.h@197
PS1, Line 197: // Lock that protects Peer state changes, initialization, etc.
Could you update this to describe the new semantics of locking? E.g. seems like we don't need to hold peer_lock_ to read closed_, but we need it to update it. What other behaviors do callers need to be aware of?
--
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Jun 2020 22:39:40 +0000
Gerrit-HasComments: Yes
[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16021 )
Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................
Patch Set 2:
> You mentioned this to me on Slack:
> "i also ran several micro-benches, they either showed 4-5%
> improvement or at least no downside"
>
> It'd be nice to incorporate this in the commit message, or even
> better, add a small benchmark tracking the spinlock contention time
> in a low-heartbeat interval, write-heavy workload or somesuch.
Yeah, I'm working on the test.
Thank you for the review!
--
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 04 Jun 2020 01:40:20 +0000
Gerrit-HasComments: No
[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16021 )
Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................
Patch Set 1:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG@12
PS1, Line 12: worst the ca
> nit: the worst case
Done
http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG@14
PS1, Line 14: threads
> nit: blocking other tasks?
Done
http://gerrit.cloudera.org:8080/#/c/16021/1/src/kudu/consensus/consensus_peers.h
File src/kudu/consensus/consensus_peers.h:
http://gerrit.cloudera.org:8080/#/c/16021/1/src/kudu/consensus/consensus_peers.h@197
PS1, Line 197: // Lock that protects Peer state changes, initialization, etc.
> Could you update this to describe the new semantics of locking? E.g. seems
Done
--
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 03 Jun 2020 03:29:08 +0000
Gerrit-HasComments: Yes