You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2023/06/17 00:43:00 UTC

[jira] [Commented] (IMPALA-12223) Coordinator crash in serializing huge profile

    [ https://issues.apache.org/jira/browse/IMPALA-12223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733684#comment-17733684 ] 

Quanlong Huang commented on IMPALA-12223:
-----------------------------------------

FWIW, I used the DEBUG build to reproduce the crash and triggered a core dump in which I can inspect class member values.
{code:java}
(gdb) f 9 
#9  0x0000000001e88e90 in apache::thrift::transport::TBufferBase::write (this=0x7f27a4e034a0, 
    buf=0x1bd62000 "Build Side Codegen Enabled, Hash Table Construction Codegen Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen En"..., len=970757479)
    at ../../../toolchain/toolchain-packages-gcc7.5.0/thrift-0.16.0-p3/include/thrift/transport/TBufferTransports.h:104
104         writeSlow(buf, len)
(gdb) p *this
warning: RTTI symbol not found for class 'apache::thrift::transport::TMemoryBuffer'
$6 = warning: RTTI symbol not found for class 'apache::thrift::transport::TMemoryBuffer'
{<apache::thrift::transport::TVirtualTransport<apache::thrift::transport::TBufferBase, apache::thrift::transport::TTransportDefaults>> = {<apache::thrift::transport::TTransportDefaults> = {<apache::thrift::transport::TTransport> = { 
        _vptr.TTransport = 0x7b50f40 <vtable for apache::thrift::transport::TMemoryBuffer+16>, 
        configuration_ = {<std::__shared_ptr<apache::thrift::TConfiguration, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<apache::thrift::TConfiguration, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, 
            _M_ptr = 0xf7b1a90, _M_refcount = {_M_pi = 0xf7b1a80}}, <No data fields>}, remainingMessageSize_ = 1073741824, knownMessageSize_ = 1073741824}, <No data fields>}, <No data fields>}, 
  rBase_ = 0x7f2495d8e000 "\031\374\251\001\030,Query (id=ff4beb6d3a6f78c6:10098b6400000000)\025\006\031,\030\021InactiveTotalTime\025\n\026", 
  rBound_ = 0x7f2495d8e000 "\031\374\251\001\030,Query (id=ff4beb6d3a6f78c6:10098b6400000000)\025\006\031,\030\021InactiveTotalTime\025\n\026", wBase_ = 0x7f256c738a52 "", wBound_ = 0x7f24b5d8e000 ""}{code}
The last line shows wBase_ = 0x7f256c738a52, wBound_ = 0x7f24b5d8e000. The end of the buffer (wBound_) is beyong the start pointer (wBase_)!
{code:java}
(gdb) p wBound_ - wBase_
warning: RTTI symbol not found for class 'apache::thrift::transport::TMemoryBuffer'
warning: RTTI symbol not found for class 'apache::thrift::transport::TMemoryBuffer'
$7 = -3063589458{code}
Futher debug shows the cause is in the update of TMemoryBuffer::ensureCanWrite() inside the thrift lib:
{code:cpp}
void TMemoryBuffer::ensureCanWrite(uint32_t len) {
  ...
  const uint32_t current_used = bufferSize_ - avail;
  const uint32_t required_buffer_size = len + current_used; // <---- This could overflow
  if (required_buffer_size > maxBufferSize_) {
    throw TTransportException(...);
  }

  // Always grow to the next bigger power of two:
  const double suggested_buffer_size = std::exp2(std::ceil(std::log2(required_buffer_size)));
  // Unless the power of two exceeds maxBufferSize_:
  const uint64_t new_size = static_cast<uint64_t>((std::min)(suggested_buffer_size, static_cast<double>(maxBufferSize_)));

  // Allocate into a new pointer so we don't bork ours if it fails.
  auto* new_buffer = static_cast<uint8_t*>(std::realloc(buffer_, static_cast<std::size_t>(new_size)));
  if (new_buffer == nullptr) {
    throw std::bad_alloc();
  }

  rBase_ = new_buffer + (rBase_ - buffer_);
  rBound_ = new_buffer + (rBound_ - buffer_);
  wBase_ = new_buffer + (wBase_ - buffer_);
  wBound_ = new_buffer + new_size;  // <-- Due to overflow, new_size could be smaller than (wBase_ - buffer_)
  // Note: with realloc() we do not need to free the previous buffer:
  buffer_ = new_buffer;
  bufferSize_ = static_cast<uint32_t>(new_size);
}{code}

> Coordinator crash in serializing huge profile
> ---------------------------------------------
>
>                 Key: IMPALA-12223
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12223
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.2.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> Bugs like IMPALA-11200, IMPALA-12204 could cause huge profiles (>4GB). When serializing such profiles, coordinator might crash. Here is the resolved backtrace:
> {noformat}
> (gdb) bt
> #0  0x00007f208a9bd514 in __memcpy_ssse3_back () from /lib64/libc.so.6
> #1  0x00000000030a2a82 in apache::thrift::transport::TMemoryBuffer::writeSlow(unsigned char const*, unsigned int) ()
> #2  0x0000000001156d25 in apache::thrift::protocol::TCompactProtocolT<apache::thrift::transport::TMemoryBuffer>::writeBinary (this=0x269e1970, str=...)
>     at ../../../toolchain/toolchain-packages-gcc7.5.0/thrift-0.16.0-p3/include/thrift/protocol/TCompactProtocol.tcc:285
> #3  0x00000000011fb3d2 in writeString (str=..., this=0x269e1970) at ../../../toolchain/toolchain-packages-gcc7.5.0/thrift-0.16.0-p3/include/thrift/protocol/TProtocol.h:345
> #4  impala::TRuntimeProfileNode::write<apache::thrift::protocol::TProtocol> (this=this@entry=0xf28f2490, oprot=oprot@entry=0x269e1970) at ../../generated-sources/gen-cpp/RuntimeProfile_types.tcc:2003
> #5  0x00000000011fd18b in impala::TRuntimeProfileTree::write<apache::thrift::protocol::TProtocol> (this=0x7f16c4ba0920, oprot=0x269e1970) at ../../generated-sources/gen-cpp/RuntimeProfile_types.tcc:2181
> #6  0x0000000001686020 in SerializeToBuffer<impala::TRuntimeProfileTree> (buffer=<synthetic pointer>, len=<synthetic pointer>, obj=0x7f16c4ba0920, this=0x7f16c4ba08e0) at ../rpc/thrift-util.h:83
> #7  SerializeToVector<impala::TRuntimeProfileTree> (result=<synthetic pointer>, obj=0x7f16c4ba0920, this=0x7f16c4ba08e0) at ../rpc/thrift-util.h:71
> #8  impala::RuntimeProfile::Compress (this=this@entry=0x32d1f2c0, out=out@entry=0x7f16c4ba0c20) at runtime-profile.cc:1563
> #9  0x0000000001686538 in impala::RuntimeProfile::SerializeToArchiveString (this=this@entry=0x32d1f2c0, out=0x7f16c4ba0e60) at runtime-profile.cc:1627
> #10 0x00000000013cf91e in impala::ImpalaServer::GetRuntimeProfileOutput (this=this@entry=0xefa7400, user=..., query_handle=..., format=format@entry=impala::TRuntimeProfileFormat::BASE64, profile=profile@entry=0x7f16c4ba0dd0)
>     at impala-server.cc:695
> #11 0x00000000013d167e in impala::ImpalaServer::GetRuntimeProfileOutput (this=this@entry=0xefa7400, query_id=..., user=..., format=format@entry=impala::TRuntimeProfileFormat::BASE64, profile=profile@entry=0x7f16c4ba0dd0) at impala-server.cc:809
> #12 0x00000000013b3bff in impala::ImpalaHttpHandler::QueryProfileHelper (this=0xe3e8660, req=..., document=document@entry=0x7f16c4ba1130, format=format@entry=impala::TRuntimeProfileFormat::BASE64) at impala-http-handler.cc:330
> #13 0x00000000013b5c86 in impala::ImpalaHttpHandler::QueryProfileEncodedHandler (this=<optimized out>, req=..., document=0x7f16c4ba1130) at impala-http-handler.cc:344
> #14 0x00000000016cf73d in operator() (a1=0x7f16c4ba1130, a0=..., this=0xf507538) at ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #15 impala::Webserver::RenderUrlWithTemplate (this=this@entry=0xe62eec0, connection=connection@entry=0x401aa000, req=..., url_handler=..., output=output@entry=0x7f16c4ba1730, content_type=content_type@entry=0x7f16c4ba15bc) at webserver.cc:897
> #16 0x00000000016d1cf3 in impala::Webserver::BeginRequestCallback (this=0xe62eec0, connection=0x401aa000, request_info=0x401aa000) at webserver.cc:772
> #17 0x00000000016e7ed1 in handle_request ()
> #18 0x00000000016ea5f8 in worker_thread ()
> #19 0x00007f208de89ea5 in start_thread () from /lib64/libpthread.so.0
> #20 0x00007f208a965b0d in clone () from /lib64/libc.so.6{noformat}
> It crashes In memcpy at a move instruction writing to memory:
> {code:java}
> (gdb) x/5i $pc-6
>    0x7f208a9bd50e <__memcpy_ssse3_back+6302>:	add    %al,(%rax)
>    0x7f208a9bd510 <__memcpy_ssse3_back+6304>:	movdqu (%rsi),%xmm1
> => 0x7f208a9bd514 <__memcpy_ssse3_back+6308>:	movdqu %xmm0,(%r8)
>    0x7f208a9bd519 <__memcpy_ssse3_back+6313>:	movdqa %xmm1,(%rdi)
>    0x7f208a9bd51d <__memcpy_ssse3_back+6317>:	sub    $0x10,%rdx {code}
> Futher debug shows the cause is a write overflow in the thrift lib (THRIFT-5716). The bug exists since thrift-0.14.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org