You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brpc.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:48:15 UTC

[GitHub] [incubator-brpc] acelyc111 opened a new issue #1188: bthread::id_create_impl的coredump

acelyc111 opened a new issue #1188:
URL: https://github.com/apache/incubator-brpc/issues/1188


   **Describe the bug (描述bug)**
   使用brpc库的Doris进程出现如下coredump栈:
   ```
   Core was generated by `/home/work/app/doris/c3prc-bigbi/be/package/be/lib/palo_be'.
   Program terminated with signal 11, Segmentation fault.
   #0  bthread::id_create_impl (id=id@entry=0x7f09b1140290, data=data@entry=0x7ac5688, on_error=on_error@entry=0x0,
       on_error2=on_error2@entry=0x1b9fea0 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
       at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:333
   333	/root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp: 没有那个文件或目录.
   Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 zlib-1.2.7-17.el7.x86_64
   (gdb) bt
   #0  bthread::id_create_impl (id=id@entry=0x7f09b1140290, data=data@entry=0x7ac5688, on_error=on_error@entry=0x0,
       on_error2=on_error2@entry=0x1b9fea0 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
       at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:333
   #1  0x0000000001d1387d in bthread_id_create2 (id=id@entry=0x7f09b1140290, data=data@entry=0x7ac5688,
       on_error=on_error@entry=0x1b9fea0 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
       at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:693
   #2  0x0000000001b9a86d in brpc::Controller::call_id (this=this@entry=0x7ac5688) at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/brpc/controller.cpp:1213
   #3  0x0000000001b9634d in brpc::Channel::CallMethod (this=0x1a31af00, method=0x21555800, controller_base=0x7ac5688, request=0x1efb80260, response=0x7ac58d8, done=0x7ac5680) at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/brpc/channel.cpp:394
   #4  0x00000000013659bf in palo::PInternalService_Stub::transmit_data (this=<optimized out>, controller=0x7ac5688, request=0x1efb80260, response=0x7ac58d8, done=0x7ac5680) at /builds/olap/doris/gensrc/build/gen_cpp/palo_internal_service.pb.cc:319
   #5  0x00000000015fb4a1 in doris::DataStreamSender::Channel::send_batch (this=this@entry=0x1efb80160, batch=batch@entry=0x0, eos=eos@entry=true) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:232
   #6  0x00000000015fc03a in doris::DataStreamSender::Channel::close_internal (this=0x1efb80160) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:289
   #7  0x00000000015fc215 in close (state=0x905baa00, this=<optimized out>) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:296
   #8  doris::DataStreamSender::close (this=0xad029c0, state=0x905baa00, exec_status=...) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:607
   #9  0x00000000010208d3 in doris::PlanFragmentExecutor::open_internal (this=this@entry=0x2655c5930) at /builds/olap/doris/be/src/runtime/plan_fragment_executor.cpp:326
   #10 0x0000000001020acc in doris::PlanFragmentExecutor::open (this=this@entry=0x2655c5930) at /builds/olap/doris/be/src/runtime/plan_fragment_executor.cpp:259
   #11 0x0000000000fb1267 in doris::FragmentExecState::execute (this=0x2655c58c0) at /builds/olap/doris/be/src/runtime/fragment_mgr.cpp:211
   #12 0x0000000000fb2d16 in doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>) (this=0x692fc00, exec_state=..., cb=...) at /builds/olap/doris/be/src/runtime/fragment_mgr.cpp:394
   #13 0x0000000000fb96b8 in __invoke_impl<void, void (doris::FragmentMgr::*&)(std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>), doris::FragmentMgr*&, std::shared_ptr<doris::FragmentExecState>&, std::function<void(doris::PlanFragmentExecutor*)>&> (__t=@0x20fbf210: 0x692fc00, __f=
       @0x20fbf1d0: (void (doris::FragmentMgr::*)(doris::FragmentMgr * const, std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>)) 0xfb2cf0 <doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>)>) at /usr/include/c++/7.3.0/bits/invoke.h:73
   #14 __invoke<void (doris::FragmentMgr::*&)(std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>), doris::FragmentMgr*&, std::shared_ptr<doris::FragmentExecState>&, std::function<void(doris::PlanFragmentExecutor*)>&> (__fn=
       @0x20fbf1d0: (void (doris::FragmentMgr::*)(doris::FragmentMgr * const, std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>)) 0xfb2cf0 <doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>)>) at /usr/include/c++/7.3.0/bits/invoke.h:95
   #15 __call<void, 0, 1, 2> (__args=..., this=0x20fbf1d0) at /usr/include/c++/7.3.0/functional:632
   #16 operator()<> (this=0x20fbf1d0) at /usr/include/c++/7.3.0/functional:718
   #17 boost::detail::function::void_function_obj_invoker0<std::_Bind_result<void, void (doris::FragmentMgr::*(doris::FragmentMgr*, std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>))(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>)>, void>::invoke(boost::detail::function::function_buffer&) (function_obj_ptr=...) at /var/local/thirdparty/installed/include/boost/function/function_template.hpp:159
   #18 0x0000000000fb24d4 in operator() (this=0x3b1cd01c0) at /var/local/thirdparty/installed/include/boost/function/function_template.hpp:759
   #19 doris::fragment_executor (param=0x3b1cd01c0) at /builds/olap/doris/be/src/runtime/fragment_mgr.cpp:419
   #20 0x00007f0b08218dc5 in start_thread () from /lib64/libpthread.so.0
   #21 0x00007f0b0852473d in clone () from /lib64/libc.so.6
   (gdb) f 0
   #0  bthread::id_create_impl (id=id@entry=0x7f09b1140290, data=data@entry=0x7ac5688, on_error=on_error@entry=0x0,
       on_error2=on_error2@entry=0x1b9fea0 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
       at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:333
   333	in /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp
   (gdb) p butex
   $1 = (uint32_t *) 0x0
   (gdb)
   ```
   还有一个类似的栈:
   ```
   Core was generated by `/home/work/app/doris/c3prc-whalecore/be/package/be/lib/palo_be'.
   Program terminated with signal 6, Aborted.
   #0  0x00007fafe031f1d7 in raise () from /lib64/libc.so.6
   Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 zlib-1.2.7-17.el7.x86_64
   (gdb) bt
   #0  0x00007fafe031f1d7 in raise () from /lib64/libc.so.6
   #1  0x00007fafe03208c8 in abort () from /lib64/libc.so.6
   #2  0x000000000230f3b6 in google::DumpStackTraceAndExit () at src/utilities.cc:147
   #3  0x00000000023066bd in google::LogMessage::Fail () at src/logging.cc:1599
   #4  0x0000000002308544 in google::LogMessage::SendToLog (this=0x7faf5a0f28a0) at src/logging.cc:1553
   #5  0x00000000023061e4 in google::LogMessage::Flush (this=0x7faf5a0f28a0) at src/logging.cc:1422
   #6  0x0000000002308f79 in google::LogMessageFatal::~LogMessageFatal (this=<optimized out>, __in_chrg=<optimized out>) at src/logging.cc:2125
   #7  0x000000000259b0a0 in bthread::id_create_impl (id=id@entry=0x7faf5a0f2900, data=data@entry=0x83f09408, on_error=on_error@entry=0x0,
       on_error2=on_error2@entry=0x2427bb0 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
       at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:331
   #8  0x000000000259b5cd in bthread_id_create2 (id=id@entry=0x7faf5a0f2900, data=data@entry=0x83f09408,
       on_error=on_error@entry=0x2427bb0 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
       at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:693
   #9  0x000000000242257d in brpc::Controller::call_id (this=this@entry=0x83f09408) at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/brpc/controller.cpp:1213
   #10 0x000000000241e05d in brpc::Channel::CallMethod (this=0xcd14600, method=0x10bd2400, controller_base=0x83f09408, request=0x139d77180, response=0x83f09658, done=0x83f09400) at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/brpc/channel.cpp:394
   #11 0x000000000134fbff in palo::PInternalService_Stub::transmit_data (this=<optimized out>, controller=0x83f09408, request=0x139d77180, response=0x83f09658, done=0x83f09400) at /builds/olap/doris/gensrc/build/gen_cpp/palo_internal_service.pb.cc:319
   #12 0x00000000015d8a91 in doris::DataStreamSender::Channel::send_batch (this=this@entry=0x139d77080, batch=batch@entry=0x139d77138, eos=eos@entry=true) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:232
   #13 0x00000000015d8d64 in doris::DataStreamSender::Channel::send_current_batch (this=this@entry=0x139d77080, eos=eos@entry=true) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:275
   #14 0x00000000015d9661 in doris::DataStreamSender::Channel::close_internal (this=0x139d77080) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:287
   #15 0x00000000015d9805 in close (state=0x1712ed800, this=<optimized out>) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:296
   #16 doris::DataStreamSender::close (this=0x48cc6820, state=0x1712ed800, exec_status=...) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:607
   #17 0x0000000001054f13 in doris::PlanFragmentExecutor::open_internal (this=this@entry=0x863465f0) at /builds/olap/doris/be/src/runtime/plan_fragment_executor.cpp:351
   #18 0x0000000001055114 in doris::PlanFragmentExecutor::open (this=this@entry=0x863465f0) at /builds/olap/doris/be/src/runtime/plan_fragment_executor.cpp:284
   #19 0x0000000000fdc7d7 in doris::FragmentExecState::execute (this=0x86346580) at /builds/olap/doris/be/src/runtime/fragment_mgr.cpp:209
   #20 0x0000000000fde5f6 in doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>) (this=0x6e9b180, exec_state=..., cb=...) at /builds/olap/doris/be/src/runtime/fragment_mgr.cpp:393
   #21 0x0000000000fe4724 in operator() (a2=<error reading variable: access outside bounds of object referenced via synthetic pointer>, a1=..., p=<optimized out>, this=<optimized out>) at /var/local/thirdparty/installed/include/boost/bind/mem_fn_template.hpp:280
   #22 operator()<boost::_mfi::mf2<void, doris::FragmentMgr, std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)> >, boost::_bi::list0> (a=<synthetic pointer>, f=..., this=<optimized out>)
       at /var/local/thirdparty/installed/include/boost/bind/bind.hpp:398
   #23 operator() (this=<optimized out>) at /var/local/thirdparty/installed/include/boost/bind/bind.hpp:1294
   #24 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, doris::FragmentMgr, std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)> >, boost::_bi::list3<boost::_bi::value<doris::FragmentMgr*>, boost::_bi::value<std::shared_ptr<doris::FragmentExecState> >, boost::_bi::value<std::function<void (doris::PlanFragmentExecutor*)> > > >, void>::invoke(boost::detail::function::function_buffer&) (function_obj_ptr=...)
       at /var/local/thirdparty/installed/include/boost/function/function_template.hpp:159
   #25 0x0000000000edc7e8 in operator() (this=0x7faf5a0f2fc0) at /var/local/thirdparty/installed/include/boost/function/function_template.hpp:759
   #26 doris::ThreadPool::work_thread (this=0x6e9b200, thread_id=<optimized out>) at /builds/olap/doris/be/src/util/thread_pool.hpp:120
   #27 0x0000000001a20a1d in thread_proxy ()
   #28 0x00007fafe00d5dc5 in start_thread () from /lib64/libpthread.so.0
   #29 0x00007fafe03e173d in clone () from /lib64/libc.so.6
   (gdb)
   ```
   相关代码:
   https://github.com/apache/incubator-brpc/blob/a6ccc96aeb92d178b38885dc7ca3c525e5699648/src/bthread/id.cpp#L321-L345
   **To Reproduce (复现方法)**
   无明确复现方法,但出现频次还挺高
   
   **Expected behavior (期望行为)**
   正常运行
   
   **Versions (各种版本)**
   OS:
   Compiler:
   brpc: 0.9.5
   protobuf:
   
   **Additional context/screenshots (更多上下文/截图)**
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 commented on issue #1188: bthread::id_create_impl的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1188:
URL: https://github.com/apache/incubator-brpc/issues/1188#issuecomment-664810394


   跟 https://github.com/apache/incubator-brpc/issues/1179 相关联的地方是他们都是从资源池中获取资源,但拿到的资源似乎都有些问题。


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] lorinlee commented on issue #1188: bthread::id_create_impl的coredump

Posted by GitBox <gi...@apache.org>.
lorinlee commented on issue #1188:
URL: https://github.com/apache/incubator-brpc/issues/1188#issuecomment-757357095


   #1325 猜测是ResourcePool返回了一个超过65535的id,原因是看ResourcePool的逻辑是65536个group * 65536个Block * 256个元素,其容量远大于2^32。而bthread_id_t的计算方式是 resource_id << 32 | version,那么resource_id == 0 和 resource_id = 65536生成的bthread_id_t的值是一样的,这俩id在return_resource的时候会把resource_id == 0的元素return两遍,后续get_resouce就会有问题。 #1179 这个也有可能是这个原因,不过1179看起来不光是TimerTask相同,而且还连在一起了,概率感觉比较低,还没有想明白
   
   这个issue另外一个问题是butex是nullptr,感觉可能是创建的时候就失败了,之前没有check,这个PR加了个check,#1326  
   
   @jamesge @acelyc111 辛苦帮忙review下看我的猜测是否合理哈,感谢
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] lorinlee commented on issue #1188: bthread::id_create_impl的coredump

Posted by GitBox <gi...@apache.org>.
lorinlee commented on issue #1188:
URL: https://github.com/apache/incubator-brpc/issues/1188#issuecomment-757357095


   #1325 猜测是ResourcePool返回了一个超过65535的id,原因是看ResourcePool的逻辑是65536个group * 65536个Block * 256个元素,其容量远大于2^32。而bthread_id_t的计算方式是 resource_id << 32 | version,那么resource_id == 0 和 resource_id = 65536生成的bthread_id_t的值是一样的,这俩id在return_resource的时候会把resource_id == 0的元素return两遍,后续get_resouce就会有问题。 #1179 这个也有可能是这个原因,不过1179看起来不光是TimerTask相同,而且还连在一起了,概率感觉比较低,还没有想明白
   
   这个issue另外一个问题是butex是nullptr,感觉可能是创建的时候就失败了,之前没有check,这个PR加了个check,#1326  
   
   @jamesge @acelyc111 辛苦帮忙review下看我的猜测是否合理哈,感谢
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] jamesge commented on issue #1188: bthread::id_create_impl的coredump

Posted by GitBox <gi...@apache.org>.
jamesge commented on issue #1188:
URL: https://github.com/apache/incubator-brpc/issues/1188#issuecomment-664812173


   最好先排除TimerThread的问题 (主干已[更新](https://github.com/apache/incubator-brpc/commit/ea87ee715b04e30a4ee255953ac94cd0f6672b59)), c++的内存问题可能广泛关联。


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org