You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Michael Ho (JIRA)" <ji...@apache.org> on 2018/02/21 07:40:00 UTC

[jira] [Resolved] (IMPALA-5528) tcmalloc contention much higher with concurrency after KRPC patch

     [ https://issues.apache.org/jira/browse/IMPALA-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Ho resolved IMPALA-5528.
--------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0
                   Impala 3.0

> tcmalloc contention much higher with concurrency after KRPC patch
> -----------------------------------------------------------------
>
>                 Key: IMPALA-5528
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5528
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>    Affects Versions: Impala 2.10.0
>            Reporter: Henry Robinson
>            Assignee: Mostafa Mokhtar
>            Priority: Critical
>             Fix For: Impala 3.0, Impala 2.12.0
>
>
> Our testing has revealed that under high concurrency (e.g. the {{many_independent_fragment_instances}} primitive), KRPC slows down execution significantly.
> This JIRA is to track the overall issue, and to link to JIRAs for specific spot fixes. This is the result of running {{perf}} on a node in a 16-node cluster, running the {{many_independent_fragment_instances}} primitive.
> {code}
> -  13.12%  impalad  impalad              [.] tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
>    - tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
>       - 93.95% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
>          - tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)
>             - 98.16% operator new[](unsigned long)
>                  29.20% impala::RowDescriptor::RowDescriptor(impala::RowDescriptor const&)
>                  16.85% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
>                  12.58% impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
>                  7.42% kudu::rpc::OutboundTransfer::CreateForCallResponse(std::vector<kudu::Slice, std::allocator<kudu::Slice> > const&, kudu::rpc::TransferCallbacks*)
>                + 4.34% impala::Codec::CreateDecompressor(impala::MemPool*, bool, impala::THdfsCompression::type, boost::scoped_ptr<impala::Codec>*)
>                  4.09% kudu::Trace::Trace()
>                  3.79% std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)
>                + 3.59% kudu::rpc::InboundCall::InboundCall(kudu::rpc::Connection*)
>                  2.66% void std::vector<impala::MemPool::ChunkInfo, std::allocator<impala::MemPool::ChunkInfo> >::_M_emplace_back_aux<impala::MemPool::ChunkInfo>(impala::MemPool::ChunkInfo&&)
>                + 2.57% kudu::rpc::Connection::HandleIncomingCall(gscoped_ptr<kudu::rpc::InboundTransfer, kudu::DefaultDeleter<kudu::rpc::InboundTransfer> >)
>                  2.04% std::vector<kudu::Slice, std::allocator<kudu::Slice> >::reserve(unsigned long)
>                  1.92% kudu::rpc::RequestHeader::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
>                  1.91% kudu::rpc::RemoteMethodPB::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
>                  1.48% kudu::rpc::Connection::ReadHandler(ev::io&, int)
>                  0.87% kudu::HeapBufferAllocator::AllocateInternal(unsigned long, unsigned long, kudu::BufferAllocator*)
>                  0.79% kudu::faststring::GrowArray(unsigned long)
>                  0.72% kudu::rpc::OutboundTransfer::CreateForCallRequest(int, std::vector<kudu::Slice, std::allocator<kudu::Slice> > const&, kudu::rpc::TransferCallbacks*)
>                  0.69% kudu::rpc::Connection::QueueOutboundCall(std::shared_ptr<kudu::rpc::OutboundCall> const&)
>                  0.69% kudu::ArenaBase<true>::ArenaBase(unsigned long, unsigned long)
>                  0.68% void std::vector<std::unique_ptr<kudu::ArenaBase<true>::Component, std::default_delete<kudu::ArenaBase<true>::Component> >, std::allocator<std::unique_ptr<kudu::ArenaBase<true>::Component, std::default_delete<kudu::ArenaBase<true>::Component> > > >::_M_emplace_back_aux<std::unique_ptr<kudu::A
>                  0.57% impala::TransmitDataResponsePb::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
>             + 1.84% tc_malloc
>       + 3.03% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)
>       + 3.02% tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)
> -  12.49%  impalad  impalad              [.] SpinLock::SpinLoop()
>    - SpinLock::SpinLoop()
>       - 98.56% SpinLock::SlowLock()
>          - 80.48% tcmalloc::CentralFreeList::InsertRange(void*, void*, int)
>             - tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
>                - 99.99% tcmalloc::ThreadCache::Scavenge()
>                   - operator delete[](void*, std::nothrow_t const&)
>                      - 22.51% impala::RowBatch::RowBatch(impala::RowDescriptor const&, impala::InboundProtoRowBatch const&, impala::MemTracker*)
>                           impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
>                        21.66% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
>                        19.52% impala::TransmitDataResponsePb::~TransmitDataResponsePb()
>                        15.30% kudu::rpc::InboundCall::~InboundCall()
>                        5.69% kudu::rpc::QueueTransferTask::Run(kudu::rpc::ReactorThread*)
>                        3.97% std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::InboundCall*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned long, kudu::rpc::InboundCall*, st
>                        2.44% kudu::rpc::RpcContext::~RpcContext()
>                        2.20% kudu::rpc::ReactorThread::AsyncHandler(ev::async&, int)
>                        1.91% std::unordered_map<unsigned long, kudu::rpc::Connection::CallAwaitingResponse*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::Connection::CallAwaitingResponse*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<
>                        1.05% kudu::Trace::~Trace()
>                        0.50% kudu::rpc::Connection::CallAwaitingResponse::~CallAwaitingResponse()
>          + 9.38% tcmalloc::ThreadCache::IncreaseCacheLimit()
>          + 7.43% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
>          + 1.50% tcmalloc::CentralFreeList::Populate()
>          + 1.19% tcmalloc::CentralFreeList::ReleaseToSpans(void*)
>       + 1.13% tcmalloc::CentralFreeList::InsertRange(void*, void*, int)
> -   8.95%  impalad  impalad              [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
>    - tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
>       - 99.71% tcmalloc::ThreadCache::Scavenge()
>          - operator delete[](void*, std::nothrow_t const&)
>               27.47% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
>             - 22.12% impala::RowBatch::RowBatch(impala::RowDescriptor const&, impala::InboundProtoRowBatch const&, impala::MemTracker*)
>                  impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
>               20.73% impala::TransmitDataResponsePb::~TransmitDataResponsePb()
>               9.98% kudu::rpc::InboundCall::~InboundCall()
>               6.32% kudu::rpc::QueueTransferTask::Run(kudu::rpc::ReactorThread*)
>               4.20% std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::InboundCall*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<u
>               2.03% kudu::rpc::ReactorThread::AsyncHandler(ev::async&, int)
>               1.88% kudu::rpc::RpcContext::~RpcContext()
>               1.00% std::unordered_map<unsigned long, kudu::rpc::Connection::CallAwaitingResponse*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::Connection::CallAwaitingResponse*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned
>               0.71% kudu::rpc::OutboundCall::~OutboundCall()
>               0.65% kudu::Trace::~Trace()
>               0.64% kudu::rpc::Connection::CallAwaitingResponse::~CallAwaitingResponse()
> +   7.90%  impalad  impalad              [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)