You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brpc.apache.org by GitBox <gi...@apache.org> on 2020/07/22 09:32:15 UTC

[GitHub] [incubator-brpc] acelyc111 opened a new issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

acelyc111 opened a new issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179


   **Describe the bug (描述bug)**
   coredump栈:
   ```
   (gdb) bt
   #0  0x00007f8ce780d1d7 in raise () from /lib64/libc.so.6
   #1  0x00007f8ce780e8c8 in abort () from /lib64/libc.so.6
   #2  0x000000000267a5e5 in __gnu_cxx::__verbose_terminate_handler () at ../../.././libstdc++-v3/libsupc++/vterminate.cc:95
   #3  0x00000000025eaa16 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:47
   #4  0x00000000025eaa61 in std::terminate () at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:57
   #5  0x00000000025e0943 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x34a8248 <typeinfo for std::bad_alloc>, dest=0x25e0160 <std::bad_alloc::~bad_alloc()>) at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:93
   #6  0x000000000251c30c in (anonymous namespace)::handle_oom (retry_fn=retry_fn@entry=0x251cbb0 <(anonymous namespace)::retry_malloc(void*)>, retry_arg=retry_arg@entry=0x4000000000, from_operator=from_operator@entry=true, nothrow=nothrow@entry=false)
       at src/tcmalloc.cc:1249
   #7  0x0000000002684a9b in tcmalloc::cpp_throw_oom (size=size@entry=274877906944) at src/tcmalloc.cc:1717
   #8  0x000000000268608e in do_allocate_full<tcmalloc::cpp_throw_oom> (size=274877906944) at src/tcmalloc.cc:1760
   #9  tcmalloc::allocate_full_cpp_throw_oom (size=size@entry=274877906944) at src/tcmalloc.cc:1772
   #10 0x00000000026861e7 in dispatch_allocate_full<tcmalloc::cpp_throw_oom> (size=274877906944) at src/tcmalloc.cc:1781
   #11 malloc_fast_path<tcmalloc::cpp_throw_oom> (size=size@entry=274877906944) at src/tcmalloc.cc:1852
   #12 tc_new (size=size@entry=274877906944) at src/tcmalloc.cc:1976
   #13 0x0000000001d26a67 in allocate (this=<optimized out>, __n=<optimized out>) at /usr/include/c++/7.3.0/ext/new_allocator.h:111
   #14 allocate (__a=..., __n=<optimized out>) at /usr/include/c++/7.3.0/bits/alloc_traits.h:436
   #15 _M_allocate (this=0x1eec4, this@entry=0x6af09f, __n=<optimized out>) at /usr/include/c++/7.3.0/bits/stl_vector.h:172
   #16 std::vector<bthread::TimerThread::Task*, std::allocator<bthread::TimerThread::Task*> >::_M_realloc_insert<bthread::TimerThread::Task* const&> (this=this@entry=0x7f8bd27d4650, __position=..., __args#0=@0x7f8bd27d4640: 0x17ab5080)
       at /usr/include/c++/7.3.0/bits/vector.tcc:406
   #17 0x0000000001d253ef in push_back (__x=@0x7f8bd27d4640: 0x17ab5080, this=0x7f8bd27d4650) at /usr/include/c++/7.3.0/bits/stl_vector.h:948
   #18 bthread::TimerThread::run (this=0x6fdd380) at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/timer_thread.cpp:354
   #19 0x0000000001d25d09 in bthread::TimerThread::run_this (arg=<optimized out>) at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/timer_thread.cpp:117
   #20 0x00007f8ce75c3dc5 in start_thread () from /lib64/libpthread.so.0
   #21 0x00007f8ce78cf73d in clone () from /lib64/libc.so.6
   (gdb)
   ```
   https://github.com/apache/incubator-brpc/blob/a6ccc96aeb92d178b38885dc7ca3c525e5699648/src/bthread/timer_thread.cpp#L348-L358
   第354行的tasks被push_back了过多的task,应当是某个task的next指向了自己,使得for循环发生无限循环。
   ```
   (gdb) pvector tasks
   elem[0]: $18 = (bthread::TimerThread::Task *) 0xba143c0
   elem[1]: $19 = (bthread::TimerThread::Task *) 0x95d28800
   elem[2]: $20 = (bthread::TimerThread::Task *) 0xed7fb300
   elem[3]: $21 = (bthread::TimerThread::Task *) 0x1b0dad40
   elem[4]: $22 = (bthread::TimerThread::Task *) 0x19052400
   elem[5]: $23 = (bthread::TimerThread::Task *) 0x16a4469c0
   elem[6]: $24 = (bthread::TimerThread::Task *) 0x17ab5080
   elem[7]: $25 = (bthread::TimerThread::Task *) 0x17ab5080
   elem[8]: $26 = (bthread::TimerThread::Task *) 0x17ab5080
   elem[9]: $27 = (bthread::TimerThread::Task *) 0x17ab5080
   elem[10]: $28 = (bthread::TimerThread::Task *) 0x17ab5080
   ...
   (gdb) p *(bthread::TimerThread::Task *) 0x17ab5080
   $79 = {
     next = 0x17ab5080,                            // next 指向了自己
     run_time = 1595391360673234,
     fn = 0x1b953c0 <brpc::HandleTimeout(void*)>,
     arg = 0x15cd00004a11,
     task_id = 249108103170,
     version = {
       <std::atomic<unsigned int>> = {
         <std::__atomic_base<unsigned int>> = {
           static _S_alignment = 4,
           _M_i = 58
         }, <No data fields>}, <No data fields>}
   }
   ```
   
   **To Reproduce (复现方法)**
   
   
   **Expected behavior (期望行为)**
   
   
   **Versions (各种版本)**
   OS:
   Compiler:
   brpc: 0.9.5
   protobuf:
   
   **Additional context/screenshots (更多上下文/截图)**
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] vagetablechicken commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
vagetablechicken commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-664184102


   不仅有consume task时出现循环链表push back OOM的问题,我们还发现了TimerThread::Task的resource pool内存增长过快导致OOM的问题。
   集群20台节点,仅有一台出现Task resource pool增长过快。下图为growth profile图
   ![image](https://user-images.githubusercontent.com/24697960/88516211-801a6b00-d01f-11ea-965a-43e6dffbfe5d.png)
   虽然growth未考虑释放,但这个数值基本与rss和tcmalloc的`generic.current_allocated_bytes`相近。
   而且间隔35min再采一次growth,做一下diff,可以看出30min,resource pool就涨了22GB(与上图的链路相同,todo 图链接)。
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-665598024


   > 那你试试看在你们用到的brpc中删除[这段if分支](https://github.com/apache/incubator-brpc/blob/e5bd064b3aed62d35b8d1f6d2456522f9558e4ee/src/bthread/timer_thread.cpp#L370),即如下这段
   > 
   > ```
   >             if (task1->try_delete()) { // already unscheduled
   >                 std::pop_heap(tasks.begin(), tasks.end(), task_greater);
   >                 tasks.pop_back();
   >                 continue;
   >             }
   > ```
   > 
   > 看看问题是否仍会重现。
   
   @jamesge 试了还是出现了循环链表的问题 :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] lorinlee commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
lorinlee commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-670533068


   @acelyc111 这个bug能在最新版本的brpc复现吗,我看 #638 修复了一个问题,和你贴的代码位置是相同的,描述也比较像


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-670583042


   @lorinlee 最新版的没试过,当时也看到这个修复了,不过看这个PR的代码貌似是防止`p`被delete掉,所以也没想尝试


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-663334359


   > 那你试试看在你们用到的brpc中删除[这段if分支](https://github.com/apache/incubator-brpc/blob/e5bd064b3aed62d35b8d1f6d2456522f9558e4ee/src/bthread/timer_thread.cpp#L370),即如下这段
   > 
   > ```
   >             if (task1->try_delete()) { // already unscheduled
   >                 std::pop_heap(tasks.begin(), tasks.end(), task_greater);
   >                 tasks.pop_back();
   >                 continue;
   >             }
   > ```
   > 
   > 看看问题是否仍会重现。
   
   好的多谢,我试试


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] morningman commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-662777256


   这个TimerThread 是干啥用的?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] jamesge commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
jamesge commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-663334021


   那你试试看在你们用到的brpc中删除[这段if分支](https://github.com/apache/incubator-brpc/blob/e5bd064b3aed62d35b8d1f6d2456522f9558e4ee/src/bthread/timer_thread.cpp#L370),即如下这段
   ```
               if (task1->try_delete()) { // already unscheduled
                   std::pop_heap(tasks.begin(), tasks.end(), task_greater);
                   tasks.pop_back();
                   continue;
               }
   ```
   看看问题是否仍会重现。


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-666489282


   可执行程序静态链接的brpc库,要检测brpc的内存问题,也要把brpc用asan编译吧?弱问下怎样编译asan版本呢,我这样编译报错 https://github.com/apache/incubator-brpc/issues/1186


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] jamesge commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
jamesge commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-662815865


   > 立即unschedule他(此时Bucket中的_task_head不会被置为nullptr),task被回收回pool
   
   unschedule只是做个[标记](https://github.com/apache/incubator-brpc/blob/e5bd064b3aed62d35b8d1f6d2456522f9558e4ee/src/bthread/timer_thread.cpp#L254),真正的删除是发生在TimerThread内
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] jamesge commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
jamesge commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-663330333


   @acelyc111 目前能在你的场景中复现么?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-662570960


   如果先schedule一个task,再立即unschedule他(此时Bucket中的_task_head不会被置为nullptr),然后再schedule一个task,那么就有可能两次task的地址是相同的,从而第二次schedule时Bucket中残留的_task_head与新增的task是同一个地址,从而造成循环链表?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 edited a comment on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 edited a comment on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-662570960


   如果先schedule一个task,再立即unschedule他(此时Bucket中的_task_head不会被置为nullptr),task被回收回pool。然后再schedule一个task,那么就有可能两次task的地址是相同的,第二次schedule时Bucket中残留的_task_head与新增的task是同一个地址,从而造成循环链表?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-662838478


   > > 立即unschedule他(此时Bucket中的_task_head不会被置为nullptr),task被回收回pool
   > 
   > unschedule只是做个[标记](https://github.com/apache/incubator-brpc/blob/e5bd064b3aed62d35b8d1f6d2456522f9558e4ee/src/bthread/timer_thread.cpp#L254),真正的删除是发生在TimerThread内
   
   那有没有什么情况,导致Bucket中的_task_head对应的task被schedule的时候重用了呢?感觉只有这种情况才能导致task的next指向自己


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] vagetablechicken edited a comment on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
vagetablechicken edited a comment on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-664184102


   不仅有consume task时出现循环链表push back OOM的问题,我们还发现了TimerThread::Task的resource pool内存增长过快导致OOM的问题。
   集群20台节点,仅有一台出现Task resource pool增长过快。下图为growth profile图
   ![image](https://user-images.githubusercontent.com/24697960/88516211-801a6b00-d01f-11ea-965a-43e6dffbfe5d.png)
   虽然growth未考虑释放,但这个数值基本与rss和tcmalloc的`generic.current_allocated_bytes`相近。
   而且间隔35min再采一次growth,做一下diff,可以看出30min,resource pool就涨了22GB(与上图的链路相同,todo 图链接151508-144059.png)。
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-662781453


   > 这个TimerThread 是干啥用的?
   
   应当是用于检测一个rpc是否超时的


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 edited a comment on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 edited a comment on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-665598024






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-663333595


   @jamesge 还不知道稳定复现的方法,但一个30个节点的集群(Doris集群),一天会有8~10次这样的coredump。可能跟使用场景有关,其他有更多节点的集群倒是没有出这样的core。


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-665379244


   cherry pick了master 对try_delete的[修改](https://github.com/apache/incubator-brpc/commit/ea87ee715b04e30a4ee255953ac94cd0f6672b59),观测时间2小时后之前的coredump暂未出现,但是出现了另外的coredump:
   ```
   Core was generated by `/home/work/app/doris/c3prc-bigbi/be/package/be/lib/palo_be'.
   Program terminated with signal 11, Segmentation fault.
   #0  swap (other=..., this=<optimized out>) at /home/laiyingchun/xm_doris/thirdparty/src/incubator-brpc-0.9.5/src/butil/iobuf_inl.h:82
   82	/home/laiyingchun/xm_doris/thirdparty/src/incubator-brpc-0.9.5/src/butil/iobuf_inl.h: 没有那个文件或目录.
   Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 zlib-1.2.7-17.el7.x86_64
   (gdb) bt
   #0  swap (other=..., this=<optimized out>) at /home/laiyingchun/xm_doris/thirdparty/src/incubator-brpc-0.9.5/src/butil/iobuf_inl.h:82
   #1  brpc::Socket::Write (this=0x14ffbe00, data=data@entry=0x7f01d9b540b0, options_in=options_in@entry=0x7f01d9b54090) at /home/laiyingchun/xm_doris/thirdparty/src/incubator-brpc-0.9.5/src/brpc/socket.cpp:1519
   #2  0x0000000001b9f88e in brpc::Controller::IssueRPC (this=this@entry=0x190f02508, start_realtime_us=start_realtime_us@entry=1595939753030853) at /home/laiyingchun/xm_doris/thirdparty/src/incubator-brpc-0.9.5/src/brpc/controller.cpp:1139
   #3  0x0000000001b96427 in brpc::Channel::CallMethod (this=0x1500c200, method=0x22d23400, controller_base=0x190f02508, request=0x1491b180, response=0x190f02758, done=0x190f02500) at /home/laiyingchun/xm_doris/thirdparty/src/incubator-brpc-0.9.5/src/brpc/channel.cpp:529
   #4  0x00000000013658bf in palo::PInternalService_Stub::transmit_data (this=<optimized out>, controller=0x190f02508, request=0x1491b180, response=0x190f02758, done=0x190f02500) at /home/laiyingchun/xm_doris/gensrc/build/gen_cpp/palo_internal_service.pb.cc:319
   #5  0x00000000015fb3a1 in doris::DataStreamSender::Channel::send_batch (this=this@entry=0x1491b080, batch=batch@entry=0x0, eos=eos@entry=true) at /home/laiyingchun/xm_doris/be/src/runtime/data_stream_sender.cpp:232
   #6  0x00000000015fbf3a in doris::DataStreamSender::Channel::close_internal (this=0x1491b080) at /home/laiyingchun/xm_doris/be/src/runtime/data_stream_sender.cpp:289
   #7  0x00000000015fc115 in close (state=0x12ca4c000, this=<optimized out>) at /home/laiyingchun/xm_doris/be/src/runtime/data_stream_sender.cpp:296
   #8  doris::DataStreamSender::close (this=0x9bde2d00, state=0x12ca4c000, exec_status=...) at /home/laiyingchun/xm_doris/be/src/runtime/data_stream_sender.cpp:607
   #9  0x00000000010207d3 in doris::PlanFragmentExecutor::open_internal (this=this@entry=0x91d02b70) at /home/laiyingchun/xm_doris/be/src/runtime/plan_fragment_executor.cpp:326
   #10 0x00000000010209cc in doris::PlanFragmentExecutor::open (this=this@entry=0x91d02b70) at /home/laiyingchun/xm_doris/be/src/runtime/plan_fragment_executor.cpp:259
   #11 0x0000000000fb1167 in doris::FragmentExecState::execute (this=0x91d02b00) at /home/laiyingchun/xm_doris/be/src/runtime/fragment_mgr.cpp:211
   #12 0x0000000000fb2c16 in doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>) (this=0x5fbbc00, exec_state=..., cb=...) at /home/laiyingchun/xm_doris/be/src/runtime/fragment_mgr.cpp:394
   #13 0x0000000000fb95b8 in __invoke_impl<void, void (doris::FragmentMgr::*&)(std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>), doris::FragmentMgr*&, std::shared_ptr<doris::FragmentExecState>&, std::function<void(doris::PlanFragmentExecutor*)>&> (__t=@0xa48be540: 0x5fbbc00, __f=
       @0xa48be500: (void (doris::FragmentMgr::*)(doris::FragmentMgr * const, std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>)) 0xfb2bf0 <doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>)>) at /usr/include/c++/7.3.0/bits/invoke.h:73
   #14 __invoke<void (doris::FragmentMgr::*&)(std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>), doris::FragmentMgr*&, std::shared_ptr<doris::FragmentExecState>&, std::function<void(doris::PlanFragmentExecutor*)>&> (__fn=
       @0xa48be500: (void (doris::FragmentMgr::*)(doris::FragmentMgr * const, std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>)) 0xfb2bf0 <doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>)>) at /usr/include/c++/7.3.0/bits/invoke.h:95
   #15 __call<void, 0, 1, 2> (__args=..., this=0xa48be500) at /usr/include/c++/7.3.0/functional:632
   #16 operator()<> (this=0xa48be500) at /usr/include/c++/7.3.0/functional:718
   #17 boost::detail::function::void_function_obj_invoker0<std::_Bind_result<void, void (doris::FragmentMgr::*(doris::FragmentMgr*, std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>))(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>)>, void>::invoke(boost::detail::function::function_buffer&) (function_obj_ptr=...) at /home/laiyingchun/xm_doris/thirdparty/installed/include/boost/function/function_template.hpp:159
   #18 0x0000000000fb23d4 in operator() (this=0xf2f9400) at /home/laiyingchun/xm_doris/thirdparty/installed/include/boost/function/function_template.hpp:759
   #19 doris::fragment_executor (param=0xf2f9400) at /home/laiyingchun/xm_doris/be/src/runtime/fragment_mgr.cpp:419
   #20 0x00007f0393131dc5 in start_thread () from /lib64/libpthread.so.0
   #21 0x00007f039343d73d in clone () from /lib64/libc.so.6
   (gdb)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 commented on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-663388665


   另外,还发现一些信息,希望对定位问题有帮助。日志中发现有:
   ```
   [work@c3-bigdata-doris-be39 log]$ grep "timer_thread" be.INFO.log.20200723
   E0723 00:16:31.562367 100086 timer_thread.cpp:255] Invalid task_id=85899345922
   E0723 00:58:46.798079 100121 timer_thread.cpp:255] Invalid task_id=103079215106
   E0723 01:10:28.283030 100144 timer_thread.cpp:255] Invalid task_id=111669149698
   E0723 01:21:14.221813 100124 timer_thread.cpp:255] Invalid task_id=120259084290
   E0723 02:17:08.497958 100135 timer_thread.cpp:255] Invalid task_id=137438953474
   E0723 03:03:53.300819 100141 timer_thread.cpp:255] Invalid task_id=163208757250
   E0723 03:16:53.389494 100139 timer_thread.cpp:255] Invalid task_id=171798691842
   E0723 03:46:54.512308 100130 timer_thread.cpp:255] Invalid task_id=180388626434
   E0723 04:13:03.006063 100085 timer_thread.cpp:255] Invalid task_id=188978561026
   E0723 05:17:00.844795 100147 timer_thread.cpp:255] Invalid task_id=197568495618
   E0723 05:55:36.732831 100149 timer_thread.cpp:255] Invalid task_id=206158430210
   E0723 06:12:50.328419 100127 timer_thread.cpp:255] Invalid task_id=223338299394
   E0723 06:21:23.337307 100124 timer_thread.cpp:255] Invalid task_id=231928233986
   E0723 07:16:56.497378 100144 timer_thread.cpp:255] Invalid task_id=240518168578
   E0723 08:09:44.952257 100127 timer_thread.cpp:255] Invalid task_id=266287972354
   E0723 08:30:03.313336 100145 timer_thread.cpp:255] Invalid task_id=274877906946
   E0723 09:01:00.651458 100085 timer_thread.cpp:255] Invalid task_id=283467841538
   E0723 09:31:20.790068 100129 timer_thread.cpp:255] Invalid task_id=292057776130
   E0723 10:21:18.116811 100086 timer_thread.cpp:255] Invalid task_id=300647710722
   E0723 10:51:44.735030 100126 timer_thread.cpp:255] Invalid task_id=309237645314
   E0723 11:09:08.432152 100140 timer_thread.cpp:255] Invalid task_id=309237645314
   E0723 12:57:10.976786 100088 timer_thread.cpp:255] Invalid task_id=317827579906
   E0723 12:57:10.976857 100144 timer_thread.cpp:255] Invalid task_id=317827579906
   F0723 16:40:25.776798 141455 timer_thread.cpp:298] Check failed: version.load(butil::memory_order_relaxed) == id_version + 2 (1384 vs. 1380)
   ```
   gdb此时产生的core:
   ```
   (gdb) bt
   #0  0x00007f4aa27871d7 in raise () from /lib64/libc.so.6
   #1  0x00007f4aa27888c8 in abort () from /lib64/libc.so.6
   #2  0x0000000001a87636 in google::DumpStackTraceAndExit () at src/utilities.cc:147
   #3  0x0000000001a7e93d in google::LogMessage::Fail () at src/logging.cc:1599
   #4  0x0000000001a807c4 in google::LogMessage::SendToLog (this=0x7f498d7d4510) at src/logging.cc:1553
   #5  0x0000000001a7e464 in google::LogMessage::Flush (this=0x7f498d7d4510) at src/logging.cc:1422
   #6  0x0000000001a811f9 in google::LogMessageFatal::~LogMessageFatal (this=<optimized out>, __in_chrg=<optimized out>) at src/logging.cc:2125
   #7  0x0000000001d24088 in bthread::TimerThread::Task::try_delete (this=0x9c41ba40) at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/timer_thread.cpp:298
   #8  0x0000000001d24e4f in bthread::TimerThread::run (this=0xcd50cc0) at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/timer_thread.cpp:353
   #9  0x0000000001d25d09 in bthread::TimerThread::run_this (arg=<optimized out>) at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/timer_thread.cpp:117
   #10 0x00007f4aa253ddc5 in start_thread () from /lib64/libpthread.so.0
   #11 0x00007f4aa284973d in clone () from /lib64/libc.so.6
   (gdb) f 7
   #7  0x0000000001d24088 in bthread::TimerThread::Task::try_delete (this=0x9c41ba40) at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/timer_thread.cpp:298
   298	in /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/timer_thread.cpp
   (gdb) p this
   $12 = (bthread::TimerThread::Task * const) 0x9c41ba40
   (gdb) p *this
   $11 = {next = 0x9c41ba40, run_time = 1595495425823580, fn = 0x1b953c0 <brpc::HandleTimeout(void*)>, arg = 0x1389000017a1, task_id = 6064493992809, version = {<std::atomic<unsigned int>> = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4,
           _M_i = 1414}, <No data fields>}, <No data fields>}}
   (gdb)
   ```
   this的next也是指向了自己;version为1414,与log中中的不一致


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org


[GitHub] [incubator-brpc] acelyc111 edited a comment on issue #1179: TimerThread模块发生tcmalloc OOM 的coredump

Posted by GitBox <gi...@apache.org>.
acelyc111 edited a comment on issue #1179:
URL: https://github.com/apache/incubator-brpc/issues/1179#issuecomment-662570960


   如果先schedule一个task,再立即unschedule他(此时Bucket中的_task_head不会被置为nullptr),task被回收会pool。然后再schedule一个task,那么就有可能两次task的地址是相同的,从而第二次schedule时Bucket中残留的_task_head与新增的task是同一个地址,从而造成循环链表?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@brpc.apache.org
For additional commands, e-mail: dev-help@brpc.apache.org