You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Michael Ho (JIRA)" <ji...@apache.org> on 2018/02/21 07:40:00 UTC
[jira] [Resolved] (IMPALA-5528) tcmalloc contention much higher
with concurrency after KRPC patch
[ https://issues.apache.org/jira/browse/IMPALA-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Ho resolved IMPALA-5528.
--------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.12.0
Impala 3.0
> tcmalloc contention much higher with concurrency after KRPC patch
> -----------------------------------------------------------------
>
> Key: IMPALA-5528
> URL: https://issues.apache.org/jira/browse/IMPALA-5528
> Project: IMPALA
> Issue Type: Sub-task
> Components: Distributed Exec
> Affects Versions: Impala 2.10.0
> Reporter: Henry Robinson
> Assignee: Mostafa Mokhtar
> Priority: Critical
> Fix For: Impala 3.0, Impala 2.12.0
>
>
> Our testing has revealed that under high concurrency (e.g. the {{many_independent_fragment_instances}} primitive), KRPC slows down execution significantly.
> This JIRA is to track the overall issue, and to link to JIRAs for specific spot fixes. This is the result of running {{perf}} on a node in a 16-node cluster, running the {{many_independent_fragment_instances}} primitive.
> {code}
> - 13.12% impalad impalad [.] tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
> - tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
> - 93.95% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
> - tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)
> - 98.16% operator new[](unsigned long)
> 29.20% impala::RowDescriptor::RowDescriptor(impala::RowDescriptor const&)
> 16.85% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
> 12.58% impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
> 7.42% kudu::rpc::OutboundTransfer::CreateForCallResponse(std::vector<kudu::Slice, std::allocator<kudu::Slice> > const&, kudu::rpc::TransferCallbacks*)
> + 4.34% impala::Codec::CreateDecompressor(impala::MemPool*, bool, impala::THdfsCompression::type, boost::scoped_ptr<impala::Codec>*)
> 4.09% kudu::Trace::Trace()
> 3.79% std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)
> + 3.59% kudu::rpc::InboundCall::InboundCall(kudu::rpc::Connection*)
> 2.66% void std::vector<impala::MemPool::ChunkInfo, std::allocator<impala::MemPool::ChunkInfo> >::_M_emplace_back_aux<impala::MemPool::ChunkInfo>(impala::MemPool::ChunkInfo&&)
> + 2.57% kudu::rpc::Connection::HandleIncomingCall(gscoped_ptr<kudu::rpc::InboundTransfer, kudu::DefaultDeleter<kudu::rpc::InboundTransfer> >)
> 2.04% std::vector<kudu::Slice, std::allocator<kudu::Slice> >::reserve(unsigned long)
> 1.92% kudu::rpc::RequestHeader::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
> 1.91% kudu::rpc::RemoteMethodPB::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
> 1.48% kudu::rpc::Connection::ReadHandler(ev::io&, int)
> 0.87% kudu::HeapBufferAllocator::AllocateInternal(unsigned long, unsigned long, kudu::BufferAllocator*)
> 0.79% kudu::faststring::GrowArray(unsigned long)
> 0.72% kudu::rpc::OutboundTransfer::CreateForCallRequest(int, std::vector<kudu::Slice, std::allocator<kudu::Slice> > const&, kudu::rpc::TransferCallbacks*)
> 0.69% kudu::rpc::Connection::QueueOutboundCall(std::shared_ptr<kudu::rpc::OutboundCall> const&)
> 0.69% kudu::ArenaBase<true>::ArenaBase(unsigned long, unsigned long)
> 0.68% void std::vector<std::unique_ptr<kudu::ArenaBase<true>::Component, std::default_delete<kudu::ArenaBase<true>::Component> >, std::allocator<std::unique_ptr<kudu::ArenaBase<true>::Component, std::default_delete<kudu::ArenaBase<true>::Component> > > >::_M_emplace_back_aux<std::unique_ptr<kudu::A
> 0.57% impala::TransmitDataResponsePb::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
> + 1.84% tc_malloc
> + 3.03% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)
> + 3.02% tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)
> - 12.49% impalad impalad [.] SpinLock::SpinLoop()
> - SpinLock::SpinLoop()
> - 98.56% SpinLock::SlowLock()
> - 80.48% tcmalloc::CentralFreeList::InsertRange(void*, void*, int)
> - tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
> - 99.99% tcmalloc::ThreadCache::Scavenge()
> - operator delete[](void*, std::nothrow_t const&)
> - 22.51% impala::RowBatch::RowBatch(impala::RowDescriptor const&, impala::InboundProtoRowBatch const&, impala::MemTracker*)
> impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
> 21.66% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
> 19.52% impala::TransmitDataResponsePb::~TransmitDataResponsePb()
> 15.30% kudu::rpc::InboundCall::~InboundCall()
> 5.69% kudu::rpc::QueueTransferTask::Run(kudu::rpc::ReactorThread*)
> 3.97% std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::InboundCall*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned long, kudu::rpc::InboundCall*, st
> 2.44% kudu::rpc::RpcContext::~RpcContext()
> 2.20% kudu::rpc::ReactorThread::AsyncHandler(ev::async&, int)
> 1.91% std::unordered_map<unsigned long, kudu::rpc::Connection::CallAwaitingResponse*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::Connection::CallAwaitingResponse*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<
> 1.05% kudu::Trace::~Trace()
> 0.50% kudu::rpc::Connection::CallAwaitingResponse::~CallAwaitingResponse()
> + 9.38% tcmalloc::ThreadCache::IncreaseCacheLimit()
> + 7.43% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
> + 1.50% tcmalloc::CentralFreeList::Populate()
> + 1.19% tcmalloc::CentralFreeList::ReleaseToSpans(void*)
> + 1.13% tcmalloc::CentralFreeList::InsertRange(void*, void*, int)
> - 8.95% impalad impalad [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
> - tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
> - 99.71% tcmalloc::ThreadCache::Scavenge()
> - operator delete[](void*, std::nothrow_t const&)
> 27.47% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
> - 22.12% impala::RowBatch::RowBatch(impala::RowDescriptor const&, impala::InboundProtoRowBatch const&, impala::MemTracker*)
> impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
> 20.73% impala::TransmitDataResponsePb::~TransmitDataResponsePb()
> 9.98% kudu::rpc::InboundCall::~InboundCall()
> 6.32% kudu::rpc::QueueTransferTask::Run(kudu::rpc::ReactorThread*)
> 4.20% std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::InboundCall*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<u
> 2.03% kudu::rpc::ReactorThread::AsyncHandler(ev::async&, int)
> 1.88% kudu::rpc::RpcContext::~RpcContext()
> 1.00% std::unordered_map<unsigned long, kudu::rpc::Connection::CallAwaitingResponse*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::Connection::CallAwaitingResponse*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned
> 0.71% kudu::rpc::OutboundCall::~OutboundCall()
> 0.65% kudu::Trace::~Trace()
> 0.64% kudu::rpc::Connection::CallAwaitingResponse::~CallAwaitingResponse()
> + 7.90% impalad impalad [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)