You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/07/20 16:31:00 UTC

[jira] [Commented] (KUDU-2727) Contention on the Raft consensus lock can cause tablet service queue overflows

    [ https://issues.apache.org/jira/browse/KUDU-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161365#comment-17161365 ] 

ASF subversion and git services commented on KUDU-2727:
-------------------------------------------------------

Commit 0914eb527e82acc727828d951559981c905a7f0c in kudu's branch refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=0914eb5 ]

[tests] add same_tablet_concurrent_writes test

Added SameTabletConcurrentWritesTest.InsertsOnly test scenario.
The scenario exercises concurrent inserts from multiple clients
into the same tablet.

The purpose of the newly introduced test is to check for lock contention
if running multiple write operations on the same tablet concurrently.
There is an interaction between threads pushing Raft consensus updates
and RPC worker threads serving write requests, and the test pinpoints
the contention over the lock primitives used in RaftConsensus.

To validate the results reported by the test, I verified that RPC queue
overflows happen much less often if using the lock-free implementation
of RaftConsensus::CheckLeadershipAndBindTerm() addressed in KUDU-2727
fix (da1c66b61).

The rates of successful write operations was the same for both cases,
and that's expected since the bottleneck is the WAL (where additional
static delays are introduced per each fsync).  However, the number of
messages from spinlock_profiling.cc like
  Waited 190 ms on lock 0x237acd4 ...
dropped significantly after KUDU-2727 fix (da1c66b61).  That's a good
news to have less contention because the freed CPU resources might be
spend on something useful, like handing another RPC request from the
queue (which isn't overflown and able to accommodate extra requests).

Below are snippets of various measurements done for this new test
before and after applying patch from 16034 review item on top.

========================================================================

Without 16034 patch:
 Performance counter stats for './bin/same_tablet_concurrent_writes-itest':

       7449.425640 task-clock                #    0.504 CPUs utilized
            47,882 context-switches          #    0.006 M/sec
             3,454 cpu-migrations            #    0.464 K/sec
            28,592 page-faults               #    0.004 M/sec
    10,211,586,270 cycles                    #    1.371 GHz
    10,647,306,766 instructions              #    1.04  insns per cycle
     1,861,229,149 branches                  #  249.849 M/sec
        25,370,590 branch-misses             #    1.36% of all branches

      14.767762000 seconds time elapsed

With 16034 patch:
 Performance counter stats for './bin/same_tablet_concurrent_writes-itest':

       5646.543970 task-clock                #    0.394 CPUs utilized
            39,194 context-switches          #    0.007 M/sec
             3,715 cpu-migrations            #    0.658 K/sec
            30,090 page-faults               #    0.005 M/sec
     8,543,832,082 cycles                    #    1.513 GHz
     9,301,870,856 instructions              #    1.09  insns per cycle
     1,590,579,357 branches                  #  281.691 M/sec
        18,563,203 branch-misses             #    1.17% of all branches

      14.339274728 seconds time elapsed

========================================================================

------------------------------------------------------------------------
                           | Without 16034 patch   |  With 16034 patch
------------------------------------------------------------------------
  write RPC request rate   | 15.8 req/sec          | 16 req/sec
  RPC queue overflows      | 1898                  | 50
  spinlock_contention_time | 22966310              | 9161557
------------------------------------------------------------------------
  rpc_incoming_queue_time  |                       |
                           | Count: 82             | Count: 82
                           | Mean: 199704          | Mean: 1037.87
                           | Percentiles:          | Percentiles:
                           |   0%  (min) = 35      |   0%  (min) = 21
                           |  25%        = 5844    |  25%        = 51
                           |  50%  (med) = 196096  |  50%  (med) = 67
                           |  75%        = 388352  |  75%        = 1608
                           |  95%        = 400640  |  95%        = 3334
                           |  99%        = 598016  |  99%        = 9960
                           |  99.9%      = 599552  |  99.9%      = 10064
                           |  99.99%     = 599552  |  99.99%     = 10064
                           |  100% (max) = 600048  |  100% (max) = 10066
------------------------------------------------------------------------
  op_apply_run_time        |                       |
                           | Count: 79             | Count: 80
                           | Mean: 99377.1         | Mean: 80796.3
                           | Percentiles:          | Percentiles:
                           |   0%  (min) = 429     |   0%  (min) = 575
                           |  25%        = 940     |  25%        = 828
                           |  50%  (med) = 1336    |  50%  (med) = 1064
                           |  75%        = 200704  |  75%        = 200704
                           |  95%        = 391168  |  95%        = 200704
                           |  99%        = 401408  |  99%        = 200704
                           |  99.9%      = 401408  |  99.9%      = 399360
                           |  99.99%     = 401408  |  99.99%     = 399360
                           |  100% (max) = 401432  |  100% (max) = 399703
------------------------------------------------------------------------
  handler_latency_kudu_tserver_TabletServerService_Write:
                           | Count: 49             | Count: 45
                           | Mean: 3.11688e+06     | Mean: 3.08435e+06
                           | Percentiles:          | Percentiles:
                           |   0%  (min) = 611494  |   0%  (min) = 636928
                           |  25%        = 1802240 |  25%        = 2392064
                           |  50%  (med) = 3391488 |  50%  (med) = 3178496
                           |  75%        = 4390912 |  75%        = 3997696
                           |  95%        = 4816896 |  95%        = 4587520
                           |  99%        = 5013504 |  99%        = 4587520
                           |  99.9%      = 5013504 |  99.9%      = 4587520
                           |  99.99%     = 5013504 |  99.99%     = 4587520
                           |  100% (max) = 5023673 |  100% (max) = 4616174
------------------------------------------------------------------------

Change-Id: I7eef6e46e7685450354473cee9d804c5054723eb
Reviewed-on: http://gerrit.cloudera.org:8080/16060
Reviewed-by: Attila Bukor <ab...@apache.org>
Tested-by: Kudu Jenkins
Reviewed-by: Grant Henke <gr...@apache.org>


> Contention on the Raft consensus lock can cause tablet service queue overflows
> ------------------------------------------------------------------------------
>
>                 Key: KUDU-2727
>                 URL: https://issues.apache.org/jira/browse/KUDU-2727
>             Project: Kudu
>          Issue Type: Improvement
>          Components: consensus, tserver
>            Reporter: William Berkeley
>            Assignee: Alexey Serbin
>            Priority: Major
>              Labels: performance, scalability
>             Fix For: 1.13.0
>
>
> Here's stacks illustrating the phenomenon:
> {noformat}
>   tids=[2201]
>         0x379ba0f710 <unknown>
>            0x1fb951a base::internal::SpinLockDelay()
>            0x1fb93b7 base::SpinLock::SlowLock()
>             0xb4e68e kudu::consensus::Peer::SignalRequest()
>             0xb9c0df kudu::consensus::PeerManager::SignalRequest()
>             0xb8c178 kudu::consensus::RaftConsensus::Replicate()
>             0xaab816 kudu::tablet::TransactionDriver::Prepare()
>             0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>            0x1fa37ed kudu::ThreadPool::DispatchThread()
>            0x1f9c2a1 kudu::Thread::SuperviseThread()
>         0x379ba079d1 start_thread
>         0x379b6e88fd clone
>   tids=[4515]
>         0x379ba0f710 <unknown>
>            0x1fb951a base::internal::SpinLockDelay()
>            0x1fb93b7 base::SpinLock::SlowLock()
>             0xb74c60 kudu::consensus::RaftConsensus::NotifyCommitIndex()
>             0xb59307 kudu::consensus::PeerMessageQueue::NotifyObserversTask()
>             0xb54058 _ZN4kudu8internal7InvokerILi2ENS0_9BindStateINS0_15RunnableAdapterIMNS_9consensus16PeerMessageQueueEFvRKSt8functionIFvPNS4_24PeerMessageQueueObserverEEEEEEFvPS5_SC_EFvNS0_17UnretainedWrapperIS5_EEZNS5_34NotifyObserversOfCommitIndexChangeElEUlS8_E_EEESH_E3RunEPNS0_13BindStateBaseE
>            0x1fa37ed kudu::ThreadPool::DispatchThread()
>            0x1f9c2a1 kudu::Thread::SuperviseThread()
>         0x379ba079d1 start_thread
>         0x379b6e88fd clone
>   tids=[22185,22194,22193,22188,22187,22186]
>         0x379ba0f710 <unknown>
>            0x1fb951a base::internal::SpinLockDelay()
>            0x1fb93b7 base::SpinLock::SlowLock()
>             0xb8bff8 kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
>             0xaaaef9 kudu::tablet::TransactionDriver::ExecuteAsync()
>             0xaa3742 kudu::tablet::TabletReplica::SubmitWrite()
>             0x92812d kudu::tserver::TabletServiceImpl::Write()
>            0x1e28f3c kudu::rpc::GeneratedServiceIf::Handle()
>            0x1e2986a kudu::rpc::ServicePool::RunThread()
>            0x1f9c2a1 kudu::Thread::SuperviseThread()
>         0x379ba079d1 start_thread
>         0x379b6e88fd clone
>   tids=[22192,22191]
>         0x379ba0f710 <unknown>
>            0x1fb951a base::internal::SpinLockDelay()
>            0x1fb93b7 base::SpinLock::SlowLock()
>            0x1e13dec kudu::rpc::ResultTracker::TrackRpc()
>            0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle()
>            0x1e2986a kudu::rpc::ServicePool::RunThread()
>            0x1f9c2a1 kudu::Thread::SuperviseThread()
>         0x379ba079d1 start_thread
>         0x379b6e88fd clone
>   tids=[4426]
>         0x379ba0f710 <unknown>
>            0x206d3d0 <unknown>
>            0x212fd25 google::protobuf::Message::SpaceUsedLong()
>            0x211dee4 google::protobuf::internal::GeneratedMessageReflection::SpaceUsedLong()
>             0xb6658e kudu::consensus::LogCache::AppendOperations()
>             0xb5c539 kudu::consensus::PeerMessageQueue::AppendOperations()
>             0xb5c7c7 kudu::consensus::PeerMessageQueue::AppendOperation()
>             0xb7c675 kudu::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked()
>             0xb8c147 kudu::consensus::RaftConsensus::Replicate()
>             0xaab816 kudu::tablet::TransactionDriver::Prepare()
>             0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>            0x1fa37ed kudu::ThreadPool::DispatchThread()
>            0x1f9c2a1 kudu::Thread::SuperviseThread()
>         0x379ba079d1 start_thread
>         0x379b6e88fd clone
> {noformat}
> {{kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()}} needs to take the lock to check the term and the Raft role. When many RPCs come in for the same tablet, the contention can hog service threads and cause queue overflows on busy systems.
> Yugabyte switched their equivalent lock to be an atomic that allows them to read the term and role wait-free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)