You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/06/23 07:28:51 UTC

[GitHub] [incubator-doris] morningman opened a new issue #3929: [Bug] BE crash occasionally

morningman opened a new issue #3929:
URL: https://github.com/apache/incubator-doris/issues/3929


   **Describe the bug**
   BE crash occasionally and `be.out` shows:
   
   ```
   palo_be: ../nptl/pthread_mutex_lock.c:80: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
   *** Aborted at 1592890873 (unix time) try "date -d @1592890873" if you are using GNU date ***
   PC: @     0x7f599a2af3f7 __GI_raise
   *** SIGABRT (@0x1f4000081a6) received by PID 33190 (TID 0x7f59797b6700) from PID 33190; stack trace: ***
       @     0x7f599a2af470 (unknown)
       @     0x7f599a2af3f7 __GI_raise
       @     0x7f599a2b07d8 __GI_abort
       @     0x7f599a2a8516 __assert_fail_base
       @     0x7f599a2a85c2 __GI___assert_fail
       @     0x7f599a06658c __GI___pthread_mutex_lock
       @          0x1ba34d6 pthread_mutex_lock
       @          0x145f4ac doris::OlapScanNode::scanner_thread()
       @           0xfa8a35 doris::PriorityThreadPool::work_thread()
       @          0x1a5bbed thread_proxy
       @     0x7f599a0641c3 start_thread
       @     0x7f599a36112d __clone
   ``
   
   The reason is that when trying to lock a mutex, the assertion failed at `mutex->__data.__owner == 0`. It expected `__owner == 0`, which is not.
   
   But when I look into the core dump file, the `__owner` field of that mutex is 0.
   
   ```
   #0  0x00007f599a2af3f7 in raise () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
   #1  0x00007f599a2b07d8 in abort () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
   #2  0x00007f599a2a8516 in __assert_fail_base () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
   #3  0x00007f599a2a85c2 in __assert_fail () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
   #4  0x00007f599a06658c in pthread_mutex_lock () from /opt/compiler/gcc-4.8.2/lib64/libpthread.so.0
   #5  0x0000000001ba34d6 in pthread_mutex_lock_impl (mutex=0x67521610) at /home/palo/thirdparty/src/incubator-brpc-0.9.5/src/bthread/mutex.cpp:551
   #6  pthread_mutex_lock (__mutex=__mutex@entry=0x67521610) at /home/palo/thirdparty/src/incubator-brpc-0.9.5/src/bthread/mutex.cpp:809
   #7  0x000000000145f4ac in pthread_mutex_scoped_lock (m_=0x67521610, this=<synthetic pointer>) at /home/palo/thirdparty/installed/include/boost/thread/pthread/pthread_mutex_scoped_lock.hpp:26
   #8  notify_one (this=0x67521610) at /home/palo/thirdparty/installed/include/boost/thread/pthread/condition_variable.hpp:126
   #9  doris::OlapScanNode::scanner_thread (this=0x67521000, scanner=0x20200bd40) at /home/palo/be/src/exec/olap_scan_node.cpp:1322
   #10 0x0000000000fa8a35 in operator() (this=0x7f59797b2828) at /home/palo/thirdparty/installed/include/boost/function/function_template.hpp:759
   #11 doris::PriorityThreadPool::work_thread (this=0x50ac300, thread_id=<optimized out>) at /home/palo/be/src/util/priority_thread_pool.hpp:138
   #12 0x0000000001a5bbed in thread_proxy ()
   #13 0x00007f599a0641c3 in start_thread () from /opt/compiler/gcc-4.8.2/lib64/libpthread.so.0
   #14 0x00007f599a36112d in clone () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
   (gdb) f 5
   #5  0x0000000001ba34d6 in pthread_mutex_lock_impl (mutex=0x67521610) at /home/palo/thirdparty/src/incubator-brpc-0.9.5/src/bthread/mutex.cpp:551
   551	/home/palo/thirdparty/src/incubator-brpc-0.9.5/src/bthread/mutex.cpp: No such file or directory.
   (gdb) p mutex
   $1 = (pthread_mutex_t *) 0x67521610
   (gdb) p *mutex
   $2 = {
     __data = {
       __lock = 0,
       __count = 0,
       __owner = 0,
       __nusers = 4294967295,
       __kind = 0,
       __spins = 0,
       __elision = 0,
       __list = {
         __prev = 0x0,
         __next = 0x0
       }
     },
     __size = '\000' <repeats 12 times>, "����", '\000' <repeats 23 times>,
     __align = 0
   }
   ```
   
   That mutex is a internal mutex of `boost::condition_variable`. I have no idea why.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken removed a comment on issue #3929: [Bug] BE crash occasionally

Posted by GitBox <gi...@apache.org>.
vagetablechicken removed a comment on issue #3929:
URL: https://github.com/apache/incubator-doris/issues/3929#issuecomment-648548777


   Maybe the `_scan_batches_lock` is far away from `_scan_batch_added_cv`?
   https://github.com/apache/incubator-doris/blob/b8ee84a120813181733c6ac8cd5b9fd3c9b2f674/be/src/exec/olap_scan_node.cpp#L1278-L1280
   https://github.com/apache/incubator-doris/blob/b8ee84a120813181733c6ac8cd5b9fd3c9b2f674/be/src/exec/olap_scan_node.cpp#L1310-L1322
   We unlocked the `_scan_batches_lock`, then `notify_one()`? So `notify_one()` is not the owner of lock?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken commented on issue #3929: [Bug] BE crash occasionally

Posted by GitBox <gi...@apache.org>.
vagetablechicken commented on issue #3929:
URL: https://github.com/apache/incubator-doris/issues/3929#issuecomment-648565350


   The binary use `/home/palo/thirdparty/src/incubator-brpc-0.9.5/src/bthread/mutex.cpp:809`, may be closely related to this coredump?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken commented on issue #3929: [Bug] BE crash occasionally

Posted by GitBox <gi...@apache.org>.
vagetablechicken commented on issue #3929:
URL: https://github.com/apache/incubator-doris/issues/3929#issuecomment-648548777


   Maybe the `_scan_batches_lock` is far away from `_scan_batch_added_cv`?
   https://github.com/apache/incubator-doris/blob/b8ee84a120813181733c6ac8cd5b9fd3c9b2f674/be/src/exec/olap_scan_node.cpp#L1278-L1280
   https://github.com/apache/incubator-doris/blob/b8ee84a120813181733c6ac8cd5b9fd3c9b2f674/be/src/exec/olap_scan_node.cpp#L1310-L1322
   We unlocked the `_scan_batches_lock`, then `notify_one()`? So `notify_one()` is not the owner of lock?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org