You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2020/06/02 08:14:17 UTC

[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16021


Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................

[consensus] small cleanup on Peer::SignalRequest()

It's possible not to hold 'peer_lock_' while scheduling calling
SendNextRequest() on the thread pool: if 'request_pending_' or 'closed_'
changes, it's handled by Peer::SendNextRequest() as necessary.  It might
be an extra wake up of a thread in Raft pool in worst the case, but it
seems better than blocking the thread running PrepareTask().  Blocking
the latter involves blocking other threads on corresponding locks
and slowing the replication of a Raft operation, where the latter leads
to lower throughput for write operations.

The motivation for this change was seeing the following in the
logs during intensive ingest workload:

  0531 13:09:40.403224 (+515724us) spinlock_profiling.cc:244] Waited 514 ms on lock 0xa1581150

with the stack symbolized like below:

  kudu::(anonymous namespace)::SubmitSpinLockProfileData(void const*, long)
    src/kudu/util/spinlock_profiling.cc:230
  kudu::consensus::Peer::SignalRequest(bool)
    src/kudu/gutil/spinlock.h:106
  kudu::consensus::PeerManager::SignalRequest(bool)
    src/kudu/consensus/peer_manager.cc:98
  kudu::consensus::RaftConsensus::Replicate(scoped_refptr<kudu::consensus::ConsensusRound> const&)
    src/kudu/consensus/raft_consensus.cc:740 (discriminator 2)
  kudu::tablet::TransactionDriver::Prepare()
    src/kudu/tablet/transactions/transaction_driver.cc:333
  kudu::tablet::TransactionDriver::PrepareTask()
    src/kudu/tablet/transactions/transaction_driver.cc:243
  kudu::ThreadPool::DispatchThread()
    src/kudu/util/threadpool.cc:686

I also did other minor cleanup on the code around.

Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_tracker.cc
6 files changed, 36 insertions(+), 37 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/16021/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16021 )

Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................

[consensus] small cleanup on Peer::SignalRequest()

It's possible not to hold 'peer_lock_' when scheduling SendNextRequest()
on the thread pool: if 'request_pending_' or 'closed_' changes, it's
handled by Peer::SendNextRequest() as necessary.  It might be an extra
run a thread in Raft pool in the worst case, but it seems better than
blocking the thread running PrepareTask().  Blocking the latter involves
blocking other tasks on corresponding locks and slowing the replication
of a Raft operation, which leads to lower write throughput.

The motivation for this change was seeing the following in the logs
during intensive ingest workloads:

  0531 13:09:40.403224 (+515724us) spinlock_profiling.cc:244] Waited 514 ms on lock 0xa1581150

with the stack symbolized like below:

  kudu::(anonymous namespace)::SubmitSpinLockProfileData(void const*, long)
    src/kudu/util/spinlock_profiling.cc:230
  kudu::consensus::Peer::SignalRequest(bool)
    src/kudu/gutil/spinlock.h:106
  kudu::consensus::PeerManager::SignalRequest(bool)
    src/kudu/consensus/peer_manager.cc:98
  kudu::consensus::RaftConsensus::Replicate(scoped_refptr<kudu::consensus::ConsensusRound> const&)
    src/kudu/consensus/raft_consensus.cc:740 (discriminator 2)
  kudu::tablet::TransactionDriver::Prepare()
    src/kudu/tablet/transactions/transaction_driver.cc:333
  kudu::tablet::TransactionDriver::PrepareTask()
    src/kudu/tablet/transactions/transaction_driver.cc:243
  kudu::ThreadPool::DispatchThread()
    src/kudu/util/threadpool.cc:686

The stack trace pointed to 'peer_lock_' in Peer::SignalRequest(),
consensus_peer.cc,167 as in one of its previous revisions, as the
rest of the stack.

I also did other minor cleanup on the code around.

Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Reviewed-on: http://gerrit.cloudera.org:8080/16021
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_tracker.cc
6 files changed, 55 insertions(+), 46 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16021 )

Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................


Patch Set 2: Code-Review+2

You mentioned this to me on Slack:
"i also ran several micro-benches, they either showed 4-5% improvement or at least no downside"

It'd be nice to incorporate this in the commit message, or even better, add a small benchmark tracking the spinlock contention time in a low-heartbeat interval, write-heavy workload or somesuch.


-- 
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 03 Jun 2020 22:39:44 +0000
Gerrit-HasComments: No

[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16021 )

Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................


Patch Set 1: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Jun 2020 22:12:53 +0000
Gerrit-HasComments: No

[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16021

to look at the new patch set (#2).

Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................

[consensus] small cleanup on Peer::SignalRequest()

It's possible not to hold 'peer_lock_' when scheduling SendNextRequest()
on the thread pool: if 'request_pending_' or 'closed_' changes, it's
handled by Peer::SendNextRequest() as necessary.  It might be an extra
run a thread in Raft pool in the worst case, but it seems better than
blocking the thread running PrepareTask().  Blocking the latter involves
blocking other tasks on corresponding locks and slowing the replication
of a Raft operation, which leads to lower write throughput.

The motivation for this change was seeing the following in the logs
during intensive ingest workloads:

  0531 13:09:40.403224 (+515724us) spinlock_profiling.cc:244] Waited 514 ms on lock 0xa1581150

with the stack symbolized like below:

  kudu::(anonymous namespace)::SubmitSpinLockProfileData(void const*, long)
    src/kudu/util/spinlock_profiling.cc:230
  kudu::consensus::Peer::SignalRequest(bool)
    src/kudu/gutil/spinlock.h:106
  kudu::consensus::PeerManager::SignalRequest(bool)
    src/kudu/consensus/peer_manager.cc:98
  kudu::consensus::RaftConsensus::Replicate(scoped_refptr<kudu::consensus::ConsensusRound> const&)
    src/kudu/consensus/raft_consensus.cc:740 (discriminator 2)
  kudu::tablet::TransactionDriver::Prepare()
    src/kudu/tablet/transactions/transaction_driver.cc:333
  kudu::tablet::TransactionDriver::PrepareTask()
    src/kudu/tablet/transactions/transaction_driver.cc:243
  kudu::ThreadPool::DispatchThread()
    src/kudu/util/threadpool.cc:686

The stack trace pointed to 'peer_lock_' in Peer::SignalRequest(),
consensus_peer.cc,167 as in one of its previous revisions, as the
rest of the stack.

I also did other minor cleanup on the code around.

Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_tracker.cc
6 files changed, 55 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/16021/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16021 )

Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG@12
PS1, Line 12: worst the ca
nit: the worst case


http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG@14
PS1, Line 14: threads
nit: blocking other tasks?


http://gerrit.cloudera.org:8080/#/c/16021/1/src/kudu/consensus/consensus_peers.h
File src/kudu/consensus/consensus_peers.h:

http://gerrit.cloudera.org:8080/#/c/16021/1/src/kudu/consensus/consensus_peers.h@197
PS1, Line 197:   // Lock that protects Peer state changes, initialization, etc.
Could you update this to describe the new semantics of locking? E.g. seems like we don't need to hold peer_lock_ to read closed_, but we need it to update it. What other behaviors do callers need to be aware of?



-- 
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Jun 2020 22:39:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16021 )

Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................


Patch Set 2:

> You mentioned this to me on Slack:
 > "i also ran several micro-benches, they either showed 4-5%
 > improvement or at least no downside"
 > 
 > It'd be nice to incorporate this in the commit message, or even
 > better, add a small benchmark tracking the spinlock contention time
 > in a low-heartbeat interval, write-heavy workload or somesuch.

Yeah, I'm working on the test.

Thank you for the review!


-- 
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 04 Jun 2020 01:40:20 +0000
Gerrit-HasComments: No

[kudu-CR] [consensus] small cleanup on Peer::SignalRequest()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16021 )

Change subject: [consensus] small cleanup on Peer::SignalRequest()
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG@12
PS1, Line 12: worst the ca
> nit: the worst case
Done


http://gerrit.cloudera.org:8080/#/c/16021/1//COMMIT_MSG@14
PS1, Line 14: threads
> nit: blocking other tasks?
Done


http://gerrit.cloudera.org:8080/#/c/16021/1/src/kudu/consensus/consensus_peers.h
File src/kudu/consensus/consensus_peers.h:

http://gerrit.cloudera.org:8080/#/c/16021/1/src/kudu/consensus/consensus_peers.h@197
PS1, Line 197:   // Lock that protects Peer state changes, initialization, etc.
> Could you update this to describe the new semantics of locking? E.g. seems 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I899aec0964bc404e6df3742bd48d0d049e52d900
Gerrit-Change-Number: 16021
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 03 Jun 2020 03:29:08 +0000
Gerrit-HasComments: Yes