You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2023/06/17 00:43:00 UTC
[jira] [Commented] (IMPALA-12223) Coordinator crash in serializing huge profile
[ https://issues.apache.org/jira/browse/IMPALA-12223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733684#comment-17733684 ]
Quanlong Huang commented on IMPALA-12223:
-----------------------------------------
FWIW, I used the DEBUG build to reproduce the crash and triggered a core dump in which I can inspect class member values.
{code:java}
(gdb) f 9
#9 0x0000000001e88e90 in apache::thrift::transport::TBufferBase::write (this=0x7f27a4e034a0,
buf=0x1bd62000 "Build Side Codegen Enabled, Hash Table Construction Codegen Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen En"..., len=970757479)
at ../../../toolchain/toolchain-packages-gcc7.5.0/thrift-0.16.0-p3/include/thrift/transport/TBufferTransports.h:104
104 writeSlow(buf, len)
(gdb) p *this
warning: RTTI symbol not found for class 'apache::thrift::transport::TMemoryBuffer'
$6 = warning: RTTI symbol not found for class 'apache::thrift::transport::TMemoryBuffer'
{<apache::thrift::transport::TVirtualTransport<apache::thrift::transport::TBufferBase, apache::thrift::transport::TTransportDefaults>> = {<apache::thrift::transport::TTransportDefaults> = {<apache::thrift::transport::TTransport> = {
_vptr.TTransport = 0x7b50f40 <vtable for apache::thrift::transport::TMemoryBuffer+16>,
configuration_ = {<std::__shared_ptr<apache::thrift::TConfiguration, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<apache::thrift::TConfiguration, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>},
_M_ptr = 0xf7b1a90, _M_refcount = {_M_pi = 0xf7b1a80}}, <No data fields>}, remainingMessageSize_ = 1073741824, knownMessageSize_ = 1073741824}, <No data fields>}, <No data fields>},
rBase_ = 0x7f2495d8e000 "\031\374\251\001\030,Query (id=ff4beb6d3a6f78c6:10098b6400000000)\025\006\031,\030\021InactiveTotalTime\025\n\026",
rBound_ = 0x7f2495d8e000 "\031\374\251\001\030,Query (id=ff4beb6d3a6f78c6:10098b6400000000)\025\006\031,\030\021InactiveTotalTime\025\n\026", wBase_ = 0x7f256c738a52 "", wBound_ = 0x7f24b5d8e000 ""}{code}
The last line shows wBase_ = 0x7f256c738a52, wBound_ = 0x7f24b5d8e000. The end of the buffer (wBound_) is beyong the start pointer (wBase_)!
{code:java}
(gdb) p wBound_ - wBase_
warning: RTTI symbol not found for class 'apache::thrift::transport::TMemoryBuffer'
warning: RTTI symbol not found for class 'apache::thrift::transport::TMemoryBuffer'
$7 = -3063589458{code}
Futher debug shows the cause is in the update of TMemoryBuffer::ensureCanWrite() inside the thrift lib:
{code:cpp}
void TMemoryBuffer::ensureCanWrite(uint32_t len) {
...
const uint32_t current_used = bufferSize_ - avail;
const uint32_t required_buffer_size = len + current_used; // <---- This could overflow
if (required_buffer_size > maxBufferSize_) {
throw TTransportException(...);
}
// Always grow to the next bigger power of two:
const double suggested_buffer_size = std::exp2(std::ceil(std::log2(required_buffer_size)));
// Unless the power of two exceeds maxBufferSize_:
const uint64_t new_size = static_cast<uint64_t>((std::min)(suggested_buffer_size, static_cast<double>(maxBufferSize_)));
// Allocate into a new pointer so we don't bork ours if it fails.
auto* new_buffer = static_cast<uint8_t*>(std::realloc(buffer_, static_cast<std::size_t>(new_size)));
if (new_buffer == nullptr) {
throw std::bad_alloc();
}
rBase_ = new_buffer + (rBase_ - buffer_);
rBound_ = new_buffer + (rBound_ - buffer_);
wBase_ = new_buffer + (wBase_ - buffer_);
wBound_ = new_buffer + new_size; // <-- Due to overflow, new_size could be smaller than (wBase_ - buffer_)
// Note: with realloc() we do not need to free the previous buffer:
buffer_ = new_buffer;
bufferSize_ = static_cast<uint32_t>(new_size);
}{code}
> Coordinator crash in serializing huge profile
> ---------------------------------------------
>
> Key: IMPALA-12223
> URL: https://issues.apache.org/jira/browse/IMPALA-12223
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.2.0
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> Bugs like IMPALA-11200, IMPALA-12204 could cause huge profiles (>4GB). When serializing such profiles, coordinator might crash. Here is the resolved backtrace:
> {noformat}
> (gdb) bt
> #0 0x00007f208a9bd514 in __memcpy_ssse3_back () from /lib64/libc.so.6
> #1 0x00000000030a2a82 in apache::thrift::transport::TMemoryBuffer::writeSlow(unsigned char const*, unsigned int) ()
> #2 0x0000000001156d25 in apache::thrift::protocol::TCompactProtocolT<apache::thrift::transport::TMemoryBuffer>::writeBinary (this=0x269e1970, str=...)
> at ../../../toolchain/toolchain-packages-gcc7.5.0/thrift-0.16.0-p3/include/thrift/protocol/TCompactProtocol.tcc:285
> #3 0x00000000011fb3d2 in writeString (str=..., this=0x269e1970) at ../../../toolchain/toolchain-packages-gcc7.5.0/thrift-0.16.0-p3/include/thrift/protocol/TProtocol.h:345
> #4 impala::TRuntimeProfileNode::write<apache::thrift::protocol::TProtocol> (this=this@entry=0xf28f2490, oprot=oprot@entry=0x269e1970) at ../../generated-sources/gen-cpp/RuntimeProfile_types.tcc:2003
> #5 0x00000000011fd18b in impala::TRuntimeProfileTree::write<apache::thrift::protocol::TProtocol> (this=0x7f16c4ba0920, oprot=0x269e1970) at ../../generated-sources/gen-cpp/RuntimeProfile_types.tcc:2181
> #6 0x0000000001686020 in SerializeToBuffer<impala::TRuntimeProfileTree> (buffer=<synthetic pointer>, len=<synthetic pointer>, obj=0x7f16c4ba0920, this=0x7f16c4ba08e0) at ../rpc/thrift-util.h:83
> #7 SerializeToVector<impala::TRuntimeProfileTree> (result=<synthetic pointer>, obj=0x7f16c4ba0920, this=0x7f16c4ba08e0) at ../rpc/thrift-util.h:71
> #8 impala::RuntimeProfile::Compress (this=this@entry=0x32d1f2c0, out=out@entry=0x7f16c4ba0c20) at runtime-profile.cc:1563
> #9 0x0000000001686538 in impala::RuntimeProfile::SerializeToArchiveString (this=this@entry=0x32d1f2c0, out=0x7f16c4ba0e60) at runtime-profile.cc:1627
> #10 0x00000000013cf91e in impala::ImpalaServer::GetRuntimeProfileOutput (this=this@entry=0xefa7400, user=..., query_handle=..., format=format@entry=impala::TRuntimeProfileFormat::BASE64, profile=profile@entry=0x7f16c4ba0dd0)
> at impala-server.cc:695
> #11 0x00000000013d167e in impala::ImpalaServer::GetRuntimeProfileOutput (this=this@entry=0xefa7400, query_id=..., user=..., format=format@entry=impala::TRuntimeProfileFormat::BASE64, profile=profile@entry=0x7f16c4ba0dd0) at impala-server.cc:809
> #12 0x00000000013b3bff in impala::ImpalaHttpHandler::QueryProfileHelper (this=0xe3e8660, req=..., document=document@entry=0x7f16c4ba1130, format=format@entry=impala::TRuntimeProfileFormat::BASE64) at impala-http-handler.cc:330
> #13 0x00000000013b5c86 in impala::ImpalaHttpHandler::QueryProfileEncodedHandler (this=<optimized out>, req=..., document=0x7f16c4ba1130) at impala-http-handler.cc:344
> #14 0x00000000016cf73d in operator() (a1=0x7f16c4ba1130, a0=..., this=0xf507538) at ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #15 impala::Webserver::RenderUrlWithTemplate (this=this@entry=0xe62eec0, connection=connection@entry=0x401aa000, req=..., url_handler=..., output=output@entry=0x7f16c4ba1730, content_type=content_type@entry=0x7f16c4ba15bc) at webserver.cc:897
> #16 0x00000000016d1cf3 in impala::Webserver::BeginRequestCallback (this=0xe62eec0, connection=0x401aa000, request_info=0x401aa000) at webserver.cc:772
> #17 0x00000000016e7ed1 in handle_request ()
> #18 0x00000000016ea5f8 in worker_thread ()
> #19 0x00007f208de89ea5 in start_thread () from /lib64/libpthread.so.0
> #20 0x00007f208a965b0d in clone () from /lib64/libc.so.6{noformat}
> It crashes In memcpy at a move instruction writing to memory:
> {code:java}
> (gdb) x/5i $pc-6
> 0x7f208a9bd50e <__memcpy_ssse3_back+6302>: add %al,(%rax)
> 0x7f208a9bd510 <__memcpy_ssse3_back+6304>: movdqu (%rsi),%xmm1
> => 0x7f208a9bd514 <__memcpy_ssse3_back+6308>: movdqu %xmm0,(%r8)
> 0x7f208a9bd519 <__memcpy_ssse3_back+6313>: movdqa %xmm1,(%rdi)
> 0x7f208a9bd51d <__memcpy_ssse3_back+6317>: sub $0x10,%rdx {code}
> Futher debug shows the cause is a write overflow in the thrift lib (THRIFT-5716). The bug exists since thrift-0.14.0.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org