You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2019/11/04 14:29:38 UTC

[GitHub] [incubator-doris] imay opened a new issue #2128: Backend's broker ping worker hang

imay opened a new issue #2128: Backend's broker ping worker hang
URL: https://github.com/apache/incubator-doris/issues/2128
 
 
   I see a stack of broker ping work looks like below. This worker is waiting a RPC's result. Maybe it gets stuck by this RPC, which will leads that this backend can't ping other broker. Brokers will think that this backend is down, and it will clear all session from this backend. Then process in this session will fail.
   
   ```
   #0  0x00007ff9ad2fca5b in recv () from /lib64/libc.so.6
   #1  0x00000000022e543d in apache::thrift::transport::TSocket::read (this=0x4389ef260, buf=0x33606da00 "", len=512) at src/thrift/transport/TSocket.cpp:545
   #2  0x00000000022ea385 in read (len=<optimized out>, buf=<optimized out>, this=<optimized out>) at ./src/thrift/transport/TTransport.h:105
   #3  apache::thrift::transport::TBufferedTransport::readSlow (this=0x1bf37ed20, buf=0x7ff953734e20 "`NsS\371\177", len=4) at src/thrift/transport/TBufferTransports.cpp:53
   #4  0x00000000010d6bae in read (len=4, buf=0x7ff953734e20 "`NsS\371\177", this=0x1bf37ed20) at /thirdparty/installed/include/thrift/transport/TBufferTransports.h
   :71
   #5  apache::thrift::transport::readAll<apache::thrift::transport::TBufferBase> (trans=..., buf=0x7ff953734e20 "`NsS\371\177", len=4) at /thirdparty/installed/inc
   lude/thrift/transport/TTransport.h:41
   #6  0x0000000000fa1f5e in readAll (len=4, buf=0x7ff953734e20 "`NsS\371\177", this=<optimized out>) at /thirdparty/installed/include/thrift/transport/TTransport.h
   :121
   #7  readI32 (this=0x60d3e5a40, i32=<synthetic pointer>) at /thirdparty/installed/include/thrift/protocol/TBinaryProtocol.tcc:371
   #8  apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport, apache::thrift::protocol::TNetworkBigEndian>::readMessageBegin (this=0x60d3e5a40, name=..., messageType=@0x7ff953734e8
   c: 32761, seqid=@0x7ff953734e88: 0) at /thirdparty/installed/include/thrift/protocol/TBinaryProtocol.tcc:203
   #9  0x00000000011a8187 in readMessageBegin (seqid=@0x7ff953734e88: 0, messageType=@0x7ff953734e8c: 32761, name=..., this=<optimized out>) at /thirdparty/installe
   d/include/thrift/protocol/TProtocol.h:431
   #10 doris::TPaloBrokerServiceClient::recv_ping (this=0x60d3e5e80, _return=...) at /core/gensrc/build/gen_cpp/TPaloBrokerService.cpp:2920
   #11 0x00000000010478aa in doris::BrokerMgr::ping (this=this@entry=0x554f680, addr=...) at /core/be/src/runtime/broker_mgr.cpp:80
   #12 0x00000000010479db in doris::BrokerMgr::ping_worker (this=0x554f680) at /core/be/src/runtime/broker_mgr.cpp:97
   #13 0x0000000002d60c9f in std::execute_native_thread_routine (__p=0x5328300) at ../../../../../gcc-7.3.0/libstdc++-v3/src/c++11/thread.cc:83
   #14 0x00007ff9acfe8e25 in start_thread () from /lib64/libpthread.so.0
   #15 0x00007ff9ad2fbbad in clone () from /lib64/libc.so.6
   ```
   
   I also see some log in broker looks like below
   
   ```
   2019-11-04 22:10:24,601 INFO pool-3-thread-1id [ClientContextManager$CheckClientExpirationTask.run():139] client [1.1.1.1:9060] is expired, remove it from contexts. last ping time is 1572876272871
   ```
   
   Maybe we can make RPC timeout a short time like 500ms.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org