You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Mike Percy (JIRA)" <ji...@apache.org> on 2019/03/26 23:39:00 UTC

[jira] [Commented] (KUDU-2727) Contention on the Raft consensus lock can cause tablet service queue overflows

    [ https://issues.apache.org/jira/browse/KUDU-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802279#comment-16802279 ] 

Mike Percy commented on KUDU-2727:
----------------------------------

I'm going to look at this in my spare time

> Contention on the Raft consensus lock can cause tablet service queue overflows
> ------------------------------------------------------------------------------
>
>                 Key: KUDU-2727
>                 URL: https://issues.apache.org/jira/browse/KUDU-2727
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Will Berkeley
>            Assignee: Mike Percy
>            Priority: Major
>
> Here's stacks illustrating the phenomenon:
> {noformat}
>   tids=[2201]
>         0x379ba0f710 <unknown>
>            0x1fb951a base::internal::SpinLockDelay()
>            0x1fb93b7 base::SpinLock::SlowLock()
>             0xb4e68e kudu::consensus::Peer::SignalRequest()
>             0xb9c0df kudu::consensus::PeerManager::SignalRequest()
>             0xb8c178 kudu::consensus::RaftConsensus::Replicate()
>             0xaab816 kudu::tablet::TransactionDriver::Prepare()
>             0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>            0x1fa37ed kudu::ThreadPool::DispatchThread()
>            0x1f9c2a1 kudu::Thread::SuperviseThread()
>         0x379ba079d1 start_thread
>         0x379b6e88fd clone
>   tids=[4515]
>         0x379ba0f710 <unknown>
>            0x1fb951a base::internal::SpinLockDelay()
>            0x1fb93b7 base::SpinLock::SlowLock()
>             0xb74c60 kudu::consensus::RaftConsensus::NotifyCommitIndex()
>             0xb59307 kudu::consensus::PeerMessageQueue::NotifyObserversTask()
>             0xb54058 _ZN4kudu8internal7InvokerILi2ENS0_9BindStateINS0_15RunnableAdapterIMNS_9consensus16PeerMessageQueueEFvRKSt8functionIFvPNS4_24PeerMessageQueueObserverEEEEEEFvPS5_SC_EFvNS0_17UnretainedWrapperIS5_EEZNS5_34NotifyObserversOfCommitIndexChangeElEUlS8_E_EEESH_E3RunEPNS0_13BindStateBaseE
>            0x1fa37ed kudu::ThreadPool::DispatchThread()
>            0x1f9c2a1 kudu::Thread::SuperviseThread()
>         0x379ba079d1 start_thread
>         0x379b6e88fd clone
>   tids=[22185,22194,22193,22188,22187,22186]
>         0x379ba0f710 <unknown>
>            0x1fb951a base::internal::SpinLockDelay()
>            0x1fb93b7 base::SpinLock::SlowLock()
>             0xb8bff8 kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
>             0xaaaef9 kudu::tablet::TransactionDriver::ExecuteAsync()
>             0xaa3742 kudu::tablet::TabletReplica::SubmitWrite()
>             0x92812d kudu::tserver::TabletServiceImpl::Write()
>            0x1e28f3c kudu::rpc::GeneratedServiceIf::Handle()
>            0x1e2986a kudu::rpc::ServicePool::RunThread()
>            0x1f9c2a1 kudu::Thread::SuperviseThread()
>         0x379ba079d1 start_thread
>         0x379b6e88fd clone
>   tids=[22192,22191]
>         0x379ba0f710 <unknown>
>            0x1fb951a base::internal::SpinLockDelay()
>            0x1fb93b7 base::SpinLock::SlowLock()
>            0x1e13dec kudu::rpc::ResultTracker::TrackRpc()
>            0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle()
>            0x1e2986a kudu::rpc::ServicePool::RunThread()
>            0x1f9c2a1 kudu::Thread::SuperviseThread()
>         0x379ba079d1 start_thread
>         0x379b6e88fd clone
>   tids=[4426]
>         0x379ba0f710 <unknown>
>            0x206d3d0 <unknown>
>            0x212fd25 google::protobuf::Message::SpaceUsedLong()
>            0x211dee4 google::protobuf::internal::GeneratedMessageReflection::SpaceUsedLong()
>             0xb6658e kudu::consensus::LogCache::AppendOperations()
>             0xb5c539 kudu::consensus::PeerMessageQueue::AppendOperations()
>             0xb5c7c7 kudu::consensus::PeerMessageQueue::AppendOperation()
>             0xb7c675 kudu::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked()
>             0xb8c147 kudu::consensus::RaftConsensus::Replicate()
>             0xaab816 kudu::tablet::TransactionDriver::Prepare()
>             0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>            0x1fa37ed kudu::ThreadPool::DispatchThread()
>            0x1f9c2a1 kudu::Thread::SuperviseThread()
>         0x379ba079d1 start_thread
>         0x379b6e88fd clone
> {noformat}
> {{kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()}} needs to take the lock to check the term and the Raft role. When many RPCs come in for the same tablet, the contention can hog service threads and cause queue overflows on busy systems.
> Yugabyte switched their equivalent lock to be an atomic that allows them to read the term and role wait-free.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)