You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2020/06/05 04:21:19 UTC

[kudu-CR] WIP [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16034


Change subject: WIP [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................

WIP [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

This patch is an attempt to address the following lock contention issue
outlined in KUDU-2727:

  tids=[93293,93284,93285,93286,93287,93288,93289,93290,93291,93292,93304,93294,93295,93296,93297,93298,93299,93300,93301,93302,93303,93313,93322,93321,93320,93319,93318,93317,93316,93315,93314,93283,93312,93311,93310,93309,93308,93307,93306,93305]
      0x7f61b79fc5e0 <unknown>
           0x1ec35f4 base::internal::SpinLockDelay()
           0x1ec347c base::SpinLock::SlowLock()
            0xb7deb8 kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
            0xaab010 kudu::tablet::TransactionDriver::ExecuteAsync()
            0xaa344c kudu::tablet::TabletReplica::SubmitWrite()
            0x928fb0 kudu::tserver::TabletServiceImpl::Write()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone
  tids=[93383]
      0x7f61b79fc5e0 <unknown>
      0x7f61b79f8cf2 __pthread_cond_timedwait
           0x1dfcfa9 kudu::ConditionVariable::WaitUntil()
            0xb73bc7 kudu::consensus::RaftConsensus::UpdateReplica()
            0xb75128 kudu::consensus::RaftConsensus::Update()
            0x92c5d1 kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone

WIP:
  * this is just a POC, need to run more testing (linked-list-test, etc)
  * some standard pre-commit tests are still failing

Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_meta.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
4 files changed, 33 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/16034/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16034 )

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/consensus_meta.h
File src/kudu/consensus/consensus_meta.h:

http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/consensus_meta.h@249
PS5, Line 249: 
             :   // Cached term and role of the peer_uuid_ within the active configuration,
             :   // packed into 64-bit integer.
> nit: could you add some color to this comment as to why we are using this c
Done


http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc@717
PS5, Line 717:   const auto s = AppendNewRoundToQueueUnlocked(round);
             :   if (s.ok()) {
             :     leader_is_ready_ = true;
             :   }
             :   return s;
> micro-nit: perhaps simpler as
Done


http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc@779
PS5, Line 779: // The order of reading role_and_term and leader_is_ready is essential to
             :   // deal with situations when leader_is_ready and role_and_term are read
             :   // in different Raft terms.
> Just checking my understanding of the ordering nuances here:
Right, exactly.  I added this as a comment.  Thank you for this summary.



-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 20 Jun 2020 03:52:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16034 )

Change subject: WIP [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................


Patch Set 2:

(2 comments)

Interesting trick to atomically store/retrieve 2 variables.
Just some minor comments.

http://gerrit.cloudera.org:8080/#/c/16034/2/src/kudu/consensus/consensus_meta.cc
File src/kudu/consensus/consensus_meta.cc:

http://gerrit.cloudera.org:8080/#/c/16034/2/src/kudu/consensus/consensus_meta.cc@61
PS2, Line 61: constexpr size_t kPackedRoleBits = 3;
Worth adding an acsii diagram of how the role and term bits are packed including bit widths for each.


http://gerrit.cloudera.org:8080/#/c/16034/2/src/kudu/consensus/consensus_meta.cc@75
PS2, Line 75: crush
typo



-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 17 Jun 2020 23:40:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16034 )

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16034/2/src/kudu/consensus/consensus_meta.cc
File src/kudu/consensus/consensus_meta.cc:

http://gerrit.cloudera.org:8080/#/c/16034/2/src/kudu/consensus/consensus_meta.cc@61
PS2, Line 61: constexpr size_t kPackedRoleBits = 3;
> Worth adding an acsii diagram of how the role and term bits are packed incl
Done


http://gerrit.cloudera.org:8080/#/c/16034/2/src/kudu/consensus/consensus_meta.cc@75
PS2, Line 75:   ^  
> typo
Done


http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc@201
PS5, Line 201: LockGuard l(lock_);
> Why is the lock needed now?
It's not very likely that Init() will be called concurrently by multiple threads, however SetStateUnlocked() is called in this method.  The name of the method implies the lock is held.  Not holding the lock triggers DCHECK() assertion in SetStateUnlocked().



-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 19 Jun 2020 17:25:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16034 )

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................

[consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

This addresses the following lock contention issue outlined in KUDU-2727:

  tids=[93293,93284,93285,93286,93287,93288,93289,93290,93291,93292,93304,93294,93295,93296,93297,93298,93299,93300,93301,93302,93303,93313,93322,93321,93320,93319,93318,93317,93316,93315,93314,93283,93312,93311,93310,93309,93308,93307,93306,93305]
      0x7f61b79fc5e0 <unknown>
           0x1ec35f4 base::internal::SpinLockDelay()
           0x1ec347c base::SpinLock::SlowLock()
            0xb7deb8 kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
            0xaab010 kudu::tablet::TransactionDriver::ExecuteAsync()
            0xaa344c kudu::tablet::TabletReplica::SubmitWrite()
            0x928fb0 kudu::tserver::TabletServiceImpl::Write()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone
  tids=[93383]
      0x7f61b79fc5e0 <unknown>
      0x7f61b79f8cf2 __pthread_cond_timedwait
           0x1dfcfa9 kudu::ConditionVariable::WaitUntil()
            0xb73bc7 kudu::consensus::RaftConsensus::UpdateReplica()
            0xb75128 kudu::consensus::RaftConsensus::Update()
            0x92c5d1 kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone

In addition, with this patch RaftConsensus::DumpStatusHtml() no longer
blocks Raft consensus activity and isn't blocked by it either.

The inspiration for this update came from Yugabyte-DB:
  https://github.com/yugabyte/yugabyte-db/commit/7b3271e4a0f34ff3c3f6e8961288db9cee335461

Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Reviewed-on: http://gerrit.cloudera.org:8080/16034
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_meta.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tserver/tablet_service.cc
6 files changed, 202 insertions(+), 45 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16034

to look at the new patch set (#6).

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................

[consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

This addresses the following lock contention issue outlined in KUDU-2727:

  tids=[93293,93284,93285,93286,93287,93288,93289,93290,93291,93292,93304,93294,93295,93296,93297,93298,93299,93300,93301,93302,93303,93313,93322,93321,93320,93319,93318,93317,93316,93315,93314,93283,93312,93311,93310,93309,93308,93307,93306,93305]
      0x7f61b79fc5e0 <unknown>
           0x1ec35f4 base::internal::SpinLockDelay()
           0x1ec347c base::SpinLock::SlowLock()
            0xb7deb8 kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
            0xaab010 kudu::tablet::TransactionDriver::ExecuteAsync()
            0xaa344c kudu::tablet::TabletReplica::SubmitWrite()
            0x928fb0 kudu::tserver::TabletServiceImpl::Write()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone
  tids=[93383]
      0x7f61b79fc5e0 <unknown>
      0x7f61b79f8cf2 __pthread_cond_timedwait
           0x1dfcfa9 kudu::ConditionVariable::WaitUntil()
            0xb73bc7 kudu::consensus::RaftConsensus::UpdateReplica()
            0xb75128 kudu::consensus::RaftConsensus::Update()
            0x92c5d1 kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone

In addition, with this patch RaftConsensus::DumpStatusHtml() no longer
blocks Raft consensus activity and isn't blocked by it either.

The inspiration for this update came from Yugabyte-DB:
  https://github.com/yugabyte/yugabyte-db/commit/7b3271e4a0f34ff3c3f6e8961288db9cee335461

Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_meta.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tserver/tablet_service.cc
6 files changed, 202 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/16034/6
-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16034

to look at the new patch set (#2).

Change subject: WIP [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................

WIP [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

This patch is an attempt to address the following lock contention issue
outlined in KUDU-2727:

  tids=[93293,93284,93285,93286,93287,93288,93289,93290,93291,93292,93304,93294,93295,93296,93297,93298,93299,93300,93301,93302,93303,93313,93322,93321,93320,93319,93318,93317,93316,93315,93314,93283,93312,93311,93310,93309,93308,93307,93306,93305]
      0x7f61b79fc5e0 <unknown>
           0x1ec35f4 base::internal::SpinLockDelay()
           0x1ec347c base::SpinLock::SlowLock()
            0xb7deb8 kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
            0xaab010 kudu::tablet::TransactionDriver::ExecuteAsync()
            0xaa344c kudu::tablet::TabletReplica::SubmitWrite()
            0x928fb0 kudu::tserver::TabletServiceImpl::Write()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone
  tids=[93383]
      0x7f61b79fc5e0 <unknown>
      0x7f61b79f8cf2 __pthread_cond_timedwait
           0x1dfcfa9 kudu::ConditionVariable::WaitUntil()
            0xb73bc7 kudu::consensus::RaftConsensus::UpdateReplica()
            0xb75128 kudu::consensus::RaftConsensus::Update()
            0x92c5d1 kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone

The inspiration for this update comes from Yugabyte-DB:
  https://github.com/yugabyte/yugabyte-db/commit/7b3271e4a0f34ff3c3f6e8961288db9cee335461

WIP:
  * the regression test for KUDU-597 is failing (RaftConsensusITest.TestKUDU_597)
  * the assert from mvcc.cc sometimes triggers elsewhere:
      F0606 05:16:10.402253  1593 mvcc.cc:68] Check failed: !cur_snap_.IsCommitted(timestamp) Trying to start a new txn at an already committed timestamp: 6518458634355691520, current MVCC snapshot: MvccSnapshot[committed={T|T < 6518458634355847168}]

Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_meta.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
5 files changed, 135 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/16034/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16034

to look at the new patch set (#3).

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................

[consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

This addresses the following lock contention issue outlined in KUDU-2727:

  tids=[93293,93284,93285,93286,93287,93288,93289,93290,93291,93292,93304,93294,93295,93296,93297,93298,93299,93300,93301,93302,93303,93313,93322,93321,93320,93319,93318,93317,93316,93315,93314,93283,93312,93311,93310,93309,93308,93307,93306,93305]
      0x7f61b79fc5e0 <unknown>
           0x1ec35f4 base::internal::SpinLockDelay()
           0x1ec347c base::SpinLock::SlowLock()
            0xb7deb8 kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
            0xaab010 kudu::tablet::TransactionDriver::ExecuteAsync()
            0xaa344c kudu::tablet::TabletReplica::SubmitWrite()
            0x928fb0 kudu::tserver::TabletServiceImpl::Write()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone
  tids=[93383]
      0x7f61b79fc5e0 <unknown>
      0x7f61b79f8cf2 __pthread_cond_timedwait
           0x1dfcfa9 kudu::ConditionVariable::WaitUntil()
            0xb73bc7 kudu::consensus::RaftConsensus::UpdateReplica()
            0xb75128 kudu::consensus::RaftConsensus::Update()
            0x92c5d1 kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone

In addition, with this patch RaftConsensus::DumpStatusHtml() no longer
blocks Raft consensus activity and isn't blocked by it either.

The inspiration for this update came from Yugabyte-DB:
  https://github.com/yugabyte/yugabyte-db/commit/7b3271e4a0f34ff3c3f6e8961288db9cee335461

Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_meta.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tserver/tablet_service.cc
6 files changed, 179 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/16034/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16034 )

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc@201
PS5, Line 201: LockGuard l(lock_);
> It's not very likely that Init() will be called concurrently by multiple th
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 19 Jun 2020 17:48:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16034 )

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................


Patch Set 5:

(1 comment)

LGTM but don't have enough background to vote on the change.

http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc@201
PS5, Line 201: LockGuard l(lock_);
Why is the lock needed now?



-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 19 Jun 2020 17:17:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16034 )

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................


Patch Set 5:

(3 comments)

Have you run any tests or benchmarks that illustrate the improvement this patch makes on the contention?

http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/consensus_meta.h
File src/kudu/consensus/consensus_meta.h:

http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/consensus_meta.h@249
PS5, Line 249: 
             :   // Cached term and role of the peer_uuid_ within the active configuration,
             :   // packed into 64-bit integer.
nit: could you add some color to this comment as to why we are using this cached value vs active_role_, and when to use which?


http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc@717
PS5, Line 717:   const auto s = AppendNewRoundToQueueUnlocked(round);
             :   if (s.ok()) {
             :     leader_is_ready_ = true;
             :   }
             :   return s;
micro-nit: perhaps simpler as

 RETURN_NOT_OK(AppendNewRoundToQueueUnlocked(round));
 leader_is_ready_ = true;
 return Status::OK();


http://gerrit.cloudera.org:8080/#/c/16034/5/src/kudu/consensus/raft_consensus.cc@779
PS5, Line 779: // The order of reading role_and_term and leader_is_ready is essential to
             :   // deal with situations when leader_is_ready and role_and_term are read
             :   // in different Raft terms.
Just checking my understanding of the ordering nuances here:
- it's safe to bind to a stale term even if we've asserted leadership in a newer term -- the stale op will be rejected on followers anyway because Raft guarantees that a majority of replicas have accepted the new term
- it's safe for the term to be stale if we haven't asserted leadership for a newer term because we'll exit out below
- it's _unsafe_ to bind to a newer term if we've asserted leadership for a stale term, hence this ordering



-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 20 Jun 2020 02:15:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16034

to look at the new patch set (#5).

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................

[consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

This addresses the following lock contention issue outlined in KUDU-2727:

  tids=[93293,93284,93285,93286,93287,93288,93289,93290,93291,93292,93304,93294,93295,93296,93297,93298,93299,93300,93301,93302,93303,93313,93322,93321,93320,93319,93318,93317,93316,93315,93314,93283,93312,93311,93310,93309,93308,93307,93306,93305]
      0x7f61b79fc5e0 <unknown>
           0x1ec35f4 base::internal::SpinLockDelay()
           0x1ec347c base::SpinLock::SlowLock()
            0xb7deb8 kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
            0xaab010 kudu::tablet::TransactionDriver::ExecuteAsync()
            0xaa344c kudu::tablet::TabletReplica::SubmitWrite()
            0x928fb0 kudu::tserver::TabletServiceImpl::Write()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone
  tids=[93383]
      0x7f61b79fc5e0 <unknown>
      0x7f61b79f8cf2 __pthread_cond_timedwait
           0x1dfcfa9 kudu::ConditionVariable::WaitUntil()
            0xb73bc7 kudu::consensus::RaftConsensus::UpdateReplica()
            0xb75128 kudu::consensus::RaftConsensus::Update()
            0x92c5d1 kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone

In addition, with this patch RaftConsensus::DumpStatusHtml() no longer
blocks Raft consensus activity and isn't blocked by it either.

The inspiration for this update came from Yugabyte-DB:
  https://github.com/yugabyte/yugabyte-db/commit/7b3271e4a0f34ff3c3f6e8961288db9cee335461

Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_meta.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tserver/tablet_service.cc
6 files changed, 192 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/16034/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16034 )

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................


Patch Set 6:

> Seems you have benchmarked this here:
 > 
 > https://gerrit.cloudera.org/c/16060/

Right: that test was written specifically to trigger conditions this patch addresses.  The write request rate to the tablet is kept the same (because all those requests are blocked on a slowed WAL), but the  lock contention is gone and RPC queue has capacity to accept other requests.

Thank you for review!


-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 22 Jun 2020 18:55:25 +0000
Gerrit-HasComments: No

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16034

to look at the new patch set (#4).

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................

[consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

This addresses the following lock contention issue outlined in KUDU-2727:

  tids=[93293,93284,93285,93286,93287,93288,93289,93290,93291,93292,93304,93294,93295,93296,93297,93298,93299,93300,93301,93302,93303,93313,93322,93321,93320,93319,93318,93317,93316,93315,93314,93283,93312,93311,93310,93309,93308,93307,93306,93305]
      0x7f61b79fc5e0 <unknown>
           0x1ec35f4 base::internal::SpinLockDelay()
           0x1ec347c base::SpinLock::SlowLock()
            0xb7deb8 kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
            0xaab010 kudu::tablet::TransactionDriver::ExecuteAsync()
            0xaa344c kudu::tablet::TabletReplica::SubmitWrite()
            0x928fb0 kudu::tserver::TabletServiceImpl::Write()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone
  tids=[93383]
      0x7f61b79fc5e0 <unknown>
      0x7f61b79f8cf2 __pthread_cond_timedwait
           0x1dfcfa9 kudu::ConditionVariable::WaitUntil()
            0xb73bc7 kudu::consensus::RaftConsensus::UpdateReplica()
            0xb75128 kudu::consensus::RaftConsensus::Update()
            0x92c5d1 kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
           0x1d2e8d9 kudu::rpc::GeneratedServiceIf::Handle()
           0x1d2efd9 kudu::rpc::ServicePool::RunThread()
           0x1ea4a84 kudu::Thread::SuperviseThread()
      0x7f61b79f4e25 start_thread
      0x7f61b5cd234d __clone

In addition, with this patch RaftConsensus::DumpStatusHtml() no longer
blocks Raft consensus activity and isn't blocked by it either.

The inspiration for this update came from Yugabyte-DB:
  https://github.com/yugabyte/yugabyte-db/commit/7b3271e4a0f34ff3c3f6e8961288db9cee335461

Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_meta.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tserver/tablet_service.cc
6 files changed, 191 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/16034/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16034 )

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................


Patch Set 6: Code-Review+2

Seems you have benchmarked this here:

https://gerrit.cloudera.org/c/16060/


-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 22 Jun 2020 18:42:14 +0000
Gerrit-HasComments: No

[kudu-CR] [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16034 )

Change subject: [consensus] KUDU-2727 lock-free CheckLeadershipAndBindTerm()
......................................................................


Patch Set 6: Verified+1

Unrelated tests failures:
  * in TSAN, Java test org.apache.kudu.client.TestSecurity, scenario testExternallyProvidedSubjectRefreshedExternally: unable to connect to master
  * in DEBUG: https://issues.apache.org/jira/browse/KUDU-2335


-- 
To view, visit http://gerrit.cloudera.org:8080/16034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934ae3035d893fd850afe27d96f8dd6612c9ffbd
Gerrit-Change-Number: 16034
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 20 Jun 2020 04:57:04 +0000
Gerrit-HasComments: No