You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by GitBox <gi...@apache.org> on 2021/03/05 20:50:04 UTC

[GitHub] [trafficserver] bdgranger opened a new issue #7581: Sporadic crash and core dump in Mutex_trylock

bdgranger opened a new issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581


   We're seeing a very sporadic crash and core dump at customer site with production traffic load.  In every instance gathered thus far, the location of the segfault is the same:
   
   ```c++
   Core was generated by '/opt/trafficserver/bin/traffic_server --bind_stdout /opt/trafficserver/var/log/'.
   Program terminated with signal 11, Segmentation fault.
   #0  0x00000000007098c2 in Mutex_trylock (t=0x2b521e086010, m=<optimized out>)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:287
   287       if (m->thread_holding != t) {
   ```
   
   There are various stack trace paths, but all of them eventually end up going through manage_keep_alive_queue before hitting the Mutex_trylock.  Most of them have also gone through Http2ConnectionState::release_stream() although one of them goes through InactivityCop::check_inactivity().
   
   We are currently on trafficserver 8.0.7, which is where these traces come from. We do have the code referenced in #4504. We are in the process of merging to the tip of the 8.1.x branch, but thought we'd ask here to see if we can/should expect these issues to be resolved with the Http2 changes in 8.1.x? 
   
   Unfortunately, I am unable to fetch the history via vc->history as this appears to have been optimized out by the compiler and we can't run debug code in this environment. We have not yet been able to reproduce on demand in our lab.
   
   I am including samples of various stack traces retrieved. Would appreciate any thoughts!
   
   ```c++
   #0  0x00000000007098c2 in Mutex_trylock (t=0x2b521e086010, m=<optimized out>)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:287
   #1  MutexTryLock (t=0x2b521e086010, am=..., this=0x2b5224f59a10)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:555
   #2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
       handle_event=<synthetic pointer>, now=1614298436718203540, vc=<optimized out>, this=0x2b521df7ccf0) at UnixNet.cc:644
   #3  manage_keep_alive_queue (this=0x2b521df7ccf0) at UnixNet.cc:622
   #4  NetHandler::add_to_keep_alive_queue (this=0x2b521df7ccf0, vc=<optimized out>) at UnixNet.cc:688
   #5  0x000000000058d919 in Http2ConnectionState::release_stream (this=0x2b5598c7b550, stream=<optimized out>)
       at Http2ConnectionState.cc:1350
   #6  0x0000000000577fe5 in Http2Stream::destroy (this=0x2b5737147a80) at Http2Stream.cc:749
   #7  0x00000000005742f9 in Http2Stream::transaction_done (this=0x2b5737147a80) at Http2Stream.cc:391
   #8  0x0000000000505a9c in HttpSM::kill_this (this=this@entry=0x2b570d0e18d0) at HttpSM.cc:6849
   #9  0x0000000000506257 in HttpSM::main_handler (this=0x2b570d0e18d0, event=2301, data=0x2b570d0e27d0) at HttpSM.cc:2559
   #10 0x00000000005552a1 in handleEvent (data=0x2b570d0e27d0, event=2301, this=<optimized out>)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160
   #11 HttpTunnel::main_handler (this=0x2b570d0e27d0, event=<optimized out>, data=<optimized out>) at HttpTunnel.cc:1692
   #12 0x0000000000676962 in handleEvent (data=0x2b5299961590, event=103, this=<optimized out>)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160
   #13 calluser (event=103, this=0x2b5299961340) at P_CacheInternal.h:628
   #14 CacheVC::openWriteMain (this=0x2b5299961340) at CacheWrite.cc:1393
   #15 0x0000000000752b52 in handleEvent (data=0x2b5228073720, event=1, this=<optimized out>) at I_Continuation.h:160
   #16 EThread::process_event (this=this@entry=0x2b521e086010, e=e@entry=0x2b5228073720, calling_code=1) at UnixEThread.cc:131
   #17 0x000000000075341e in EThread::process_queue (this=this@entry=0x2b521e086010,
       NegativeQueue=NegativeQueue@entry=0x2b5224f59e70, ev_count=ev_count@entry=0x2b5224f59e6c,
       nq_count=nq_count@entry=0x2b5224f59e68) at UnixEThread.cc:170
   #18 0x0000000000753a08 in EThread::execute_regular (this=0x2b521e086010) at UnixEThread.cc:230
   #19 0x000000000075254a in spawn_thread_internal (a=0x1c37920) at Thread.cc:85
   #20 0x00002b52159d9e65 in start_thread () from /lib64/libpthread.so.0
   #21 0x00002b521670f88d in clone () from /lib64/libc.so.6
   ```
   =================
   ```c++
   #0  0x000000000070a07a in Mutex_trylock (t=0x2b058e939010, m=<optimized out>)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:287
   #1  MutexTryLock (t=0x2b058e939010, am=..., this=0x2b05950bcc90)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:555
   #2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
       handle_event=<synthetic pointer>, now=1614215343299903804, vc=<optimized out>, this=0x2b058e93ccf0) at UnixNet.cc:644
   #3  NetHandler::manage_keep_alive_queue (this=0x2b058e93ccf0) at UnixNet.cc:622
   #4  0x000000000070ac02 in InactivityCop::check_inactivity (this=0x2b05e80008c0, event=<optimized out>, e=<optimized out>)
       at UnixNet.cc:99
   #5  0x0000000000752b52 in handleEvent (data=0x1323f80, event=2, this=<optimized out>) at I_Continuation.h:160
   #6  EThread::process_event (this=this@entry=0x2b058e939010, e=e@entry=0x1323f80, calling_code=2) at UnixEThread.cc:131
   #7  0x0000000000753d63 in EThread::execute_regular (this=0x2b058e939010) at UnixEThread.cc:244
   #8  0x000000000075254a in spawn_thread_internal (a=0x131eda0) at Thread.cc:85
   #9  0x00002b058af6ee65 in start_thread () from /lib64/libpthread.so.0
   #10 0x00002b058bca488d in clone () from /lib64/libc.so.6
   ```
   ===============
   ```c++
   #0  0x00000000007098c2 in Mutex_trylock (t=0x2adb1db45010, m=<optimized out>)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:287
   #1  MutexTryLock (t=0x2adb1db45010, am=..., this=0x2adb245e8b90)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:555
   #2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
       handle_event=<synthetic pointer>, now=1613916233686290533, vc=<optimized out>, this=0x2adb1db48cf0) at UnixNet.cc:644
   #3  manage_keep_alive_queue (this=0x2adb1db48cf0) at UnixNet.cc:622
   #4  NetHandler::add_to_keep_alive_queue (this=0x2adb1db48cf0, vc=<optimized out>) at UnixNet.cc:688
   #5  0x000000000058d919 in Http2ConnectionState::release_stream (this=0x2adff4d97610, stream=<optimized out>)
       at Http2ConnectionState.cc:1350
   #6  0x0000000000577fe5 in Http2Stream::destroy (this=0x2ae0374c89c0) at Http2Stream.cc:749
   #7  0x00000000005784bb in Http2Stream::terminate_if_possible (this=0x2ae0374c89c0) at Http2Stream.cc:391
   #8  0x000000000057871c in Http2Stream::main_event_handler (this=0x2ae0374c89c0, event=<optimized out>, edata=0x2ae01953f5a0)
       at Http2Stream.cc:142
   #9  0x0000000000752b52 in handleEvent (data=0x2ae01953f5a0, event=104, this=<optimized out>) at I_Continuation.h:160
   #10 EThread::process_event (this=this@entry=0x2adb1db45010, e=e@entry=0x2ae01953f5a0, calling_code=104) at UnixEThread.cc:131
   #11 0x000000000075341e in EThread::process_queue (this=this@entry=0x2adb1db45010,
       NegativeQueue=NegativeQueue@entry=0x2adb245e8e70, ev_count=ev_count@entry=0x2adb245e8e6c,
       nq_count=nq_count@entry=0x2adb245e8e68) at UnixEThread.cc:170
   #12 0x0000000000753a08 in EThread::execute_regular (this=0x2adb1db45010) at UnixEThread.cc:230
   #13 0x000000000075254a in spawn_thread_internal (a=0x10beb10) at Thread.cc:85
   #14 0x00002adb1594de65 in start_thread () from /lib64/libpthread.so.0
   #15 0x00002adb1668388d in clone () from /lib64/libc.so.6
   ```
   ================
   ```c++
   #0  0x00000000007098c2 in Mutex_trylock (t=0x2af54b69b010, m=<optimized out>)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:287
   #1  MutexTryLock (t=0x2af54b69b010, am=..., this=0x2af55e7051b0)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Lock.h:555
   #2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
       handle_event=<synthetic pointer>, now=1613927876215693237, vc=<optimized out>, this=0x2af54b69ecf0) at UnixNet.cc:644
   #3  manage_keep_alive_queue (this=0x2af54b69ecf0) at UnixNet.cc:622
   #4  NetHandler::add_to_keep_alive_queue (this=0x2af54b69ecf0, vc=<optimized out>) at UnixNet.cc:688
   #5  0x000000000058d919 in Http2ConnectionState::release_stream (this=0x2afa08250890, stream=<optimized out>)
       at Http2ConnectionState.cc:1350
   #6  0x0000000000577fe5 in Http2Stream::destroy (this=0x2af8a731ffc0) at Http2Stream.cc:749
   #7  0x00000000005742f9 in Http2Stream::transaction_done (this=0x2af8a731ffc0) at Http2Stream.cc:391
   #8  0x0000000000505a9c in HttpSM::kill_this (this=this@entry=0x2afa245e6d20) at HttpSM.cc:6849
   #9  0x0000000000506257 in HttpSM::main_handler (this=0x2afa245e6d20, event=100, data=0x2af8a7320230) at HttpSM.cc:2559
   #10 0x000000000057501e in Http2Stream::update_read_request (this=this@entry=0x2af8a731ffc0,
       read_len=read_len@entry=9223372036854775807, call_update=call_update@entry=true, check_eos=check_eos@entry=false)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160
   #11 0x0000000000575401 in Http2Stream::send_request (this=this@entry=0x2af8a731ffc0, cstate=...) at Http2Stream.cc:183
   #12 0x000000000058fbf3 in rcv_headers_frame (cstate=..., frame=...) at Http2ConnectionState.cc:344
   #13 0x0000000000591e59 in Http2ConnectionState::main_event_handler (this=0x2afa08250890, event=<optimized out>,
       edata=<optimized out>) at Http2ConnectionState.cc:983
   #14 0x0000000000585f6f in handleEvent (data=0x2af55e705720, event=2253, this=0x2afa08250890)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160
   #15 send_connection_event (cont=cont@entry=0x2afa08250890, event=event@entry=2253, edata=edata@entry=0x2af55e705720)
       at Http2ClientSession.cc:58
   #16 0x0000000000586165 in Http2ClientSession::do_complete_frame_read (this=this@entry=0x2afa082505c0)
       at Http2ClientSession.cc:541
   #17 0x000000000058650d in Http2ClientSession::state_process_frame_read (this=this@entry=0x2afa082505c0, event=event@entry=100,
       vio=vio@entry=0x2af6b5ade6f8, inside_frame=inside_frame@entry=false) at Http2ClientSession.cc:594
   #18 0x0000000000586d01 in Http2ClientSession::state_start_frame_read (this=0x2afa082505c0, event=100, edata=0x2af6b5ade6f8)
       at Http2ClientSession.cc:462
   #19 0x0000000000587540 in Http2ClientSession::main_event_handler (this=0x2afa082505c0, event=100, edata=0x2af6b5ade6f8)
       at Http2ClientSession.cc:331
   #20 0x00000000007197a2 in handleEvent (data=0x2af6b5ade6f8, event=100, this=<optimized out>)
       at /usr/src/debug/trafficserver-8.0.7/iocore/eventsystem/I_Continuation.h:160
   #21 read_signal_and_update (vc=0x2af6b5ade560, vc@entry=0x1, event=event@entry=100) at UnixNetVConnection.cc:83
   #22 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2af6b5ade560, event=event@entry=100) at UnixNetVConnection.cc:1064
   #23 0x00000000006e6350 in SSLNetVConnection::net_read_io (this=0x2af6b5ade560, nh=0x2af54b69ecf0, lthread=<optimized out>)
       at SSLNetVConnection.cc:654
   #24 0x0000000000706e44 in NetHandler::process_ready_list (this=this@entry=0x2af54b69ecf0) at UnixNet.cc:395
   #25 0x000000000070711d in NetHandler::waitForActivity (this=0x2af54b69ecf0, timeout=<optimized out>) at UnixNet.cc:528
   #26 0x0000000000753f49 in EThread::execute_regular (this=0x2af54b69b010) at UnixEThread.cc:272
   #27 0x000000000075254a in spawn_thread_internal (a=0x2a454d0) at Thread.cc:85
   #28 0x00002af546c00e65 in start_thread () from /lib64/libpthread.so.0
   #29 0x00002af54793688d in clone () from /lib64/libc.so.6
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bdgranger commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
bdgranger commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-792986827


   @masaori335 - thank you. We saw many http2 changes in 8.1.x.  We will update if the issue continues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich edited a comment on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
shinrich edited a comment on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-800683984


   Looking through our internal jira's, it looks like we battled through with this issue in Feb 2019.  @a-canary put up PR #5056 to address the issue which I closed for some reason.  
   
   Looking for PR's that address a similar issue around the same time period, I ran across PR #6396 which is in the 9 branch but not the 8.1 branch.  You might give that a try.
   
   Hmm, I now see that is a full year later, but it does seem related to what you are seeing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
shinrich commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-797092519


   Looks like the issues @masaori335 identified from H2 applying to the keep_alive_queue are identified in issue #4504.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich edited a comment on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
shinrich edited a comment on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-800683984


   Looking through our internal jira's, it looks like we battled through with this issue in Feb 2019.  @a-canary put up PR #5056 to address the issue which I closed for some reason.  
   
   Looking for PR's that address a similar issue around the same time period, I ran across PR #6396 which is in the 9 branch but not the 8.1 branch.  You might give that a try.
   
   Hmm, I now see that is a full year later, and not directly related to the keep_alive/active queue logic.  So while a good fix, I'll poke around some more.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bdgranger commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
bdgranger commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-871429679


   After the additional merging of  #5952 and #6686 into our 8.1.x branch, all the cores have been resolved, and have not been seen in weeks of running in production at the customer site. Closing this issue.
   
   Thanks for all the help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
masaori335 commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-792375292


   > We are currently on trafficserver 8.0.7, which is where these traces come from. We do have the code referenced in #4504. We are in the process of merging to the tip of the 8.1.x branch, but thought we'd ask here to see if we can/should expect these issues to be resolved with the Http2 changes in 8.1.x?
   
   This might be fixed because the 8.1.x branch has various changes for the stability of HTTP/2. 
   Please let us know if 8.1.x still has this crash. FWIW, I haven't seen this crash with 8.1.2 in our env.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bdgranger closed issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
bdgranger closed issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bdgranger commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
bdgranger commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-814210091


   @shinrich thank you for the various pointers. We have pulled/ported #4379, #5120, and #5089, along with the parts of #5198 and #5201 having to do with thread_affinity into our 8.1.x fork. With this build deployed in the production environment, we have not seen any of the above stack traces in nearly a week at production traffic levels.
   
   However, on a couple of our busiest servers, we are now seeing the following stack traces once every 2 or 3 days, which both look like they may be related to #5952 and #6686. Would you agree?
   ```c++
   (gdb) bt full
   #0  Mutex_trylock (t=0x2b09f810d010, m=<optimized out>) at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:287
   No locals.
   #1  MutexTryLock (t=0x2b09f810d010, am=..., this=0x2b09fa602d80) at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:555
   No locals.
   #2  HostDBContinuation::probeEvent (this=this@entry=0x2b152e16dee0, e=e@entry=0x0) at HostDB.cc:1579
           t = 0x2b09f810d010
           lock = {m = {m_ptr = 0x0}, lock_acquired = true}
   #3  0x0000000000620096 in HostDBContinuation::dnsPendingEvent (this=0x2b152e16dee0, event=1, e=0x2b10a394f840) at HostDB.cc:1221
   No locals.
   #4  0x0000000000765082 in handleEvent (data=0x2b10a394f840, event=1, this=<optimized out>) at I_Continuation.h:187
   No locals.
   #5  EThread::process_event (this=this@entry=0x2b09f810d010, e=e@entry=0x2b10a394f840, calling_code=1) at UnixEThread.cc:131
           c_temp = <optimized out>
           lock = {m = {m_ptr = 0x21f1b20}, lock_acquired = true}
   #6  0x00000000007662a3 in EThread::execute_regular (this=0x2b09f810d010) at UnixEThread.cc:244
           done_one = true
           sleep_time = <optimized out>
           e = 0x2b10a394f840
           NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0}
           next_time = <optimized out>
           delta = <optimized out>
           prev_metric = 0x2b09f82178c0
           nq_count = 927
           ev_count = 927
           METRIC_INIT = {_loop_time = {_start = 0, _min = 9223372036854775807, _max = 0}, _events = {_min = 2147483647, _max = 0, _total = 0},
             _count = 0, _wait = 0}
   #7  0x0000000000764a7a in spawn_thread_internal (a=0x1de1670) at Thread.cc:85
           p = 0x1de1670
   #8  0x00002b09f087be65 in start_thread () from /lib64/libpthread.so.0
   No symbol table info available.
   #9  0x00002b09f15b188d in clone () from /lib64/libc.so.6
   No symbol table info available.
   ```
   ```c++
   (gdb) bt full
   #0  ink_atomiclist_push (l=l@entry=0x100b10, item=item@entry=0x2aec0b3f7da0) at ink_queue.cc:533
           adr_of_next = 0x0
           h = <optimized out>
   #1  0x0000000000764181 in ProtectedQueue::enqueue (this=this@entry=0x100b10, e=e@entry=0x2aec0b3f7da0, fast_signal=fast_signal@entry=false)
       at ProtectedQueue.cc:52
           e_ethread = 0x0
           was_empty = <optimized out>
   #2  0x000000000062bf89 in schedule (fast_signal=false, e=<optimized out>, this=0x0)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/P_UnixEThread.h:98
   No locals.
   #3  EThread::schedule_in (this=0x0, cont=0x2aec34bb8b60, t=<optimized out>, callback_event=2, cookie=0x0)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/P_UnixEThread.h:71
   No locals.
   #4  0x000000000061d20f in HostDBContinuation::do_dns (this=this@entry=0x2aec34bb8b60) at HostDB.cc:1704
           __FUNCTION__ = "do_dns"
   #5  0x000000000061ffda in HostDBContinuation::probeEvent (this=this@entry=0x2aec34bb8b60, e=e@entry=0x0) at HostDB.cc:1620
           t = <optimized out>
           lock = {m = {m_ptr = 0x2aec2381aa80}, lock_acquired = true}
   #6  0x0000000000620096 in HostDBContinuation::dnsPendingEvent (this=0x2aec34bb8b60, event=1, e=0x2aec753d6b80) at HostDB.cc:1221
   No locals.
   #7  0x0000000000765082 in handleEvent (data=0x2aec753d6b80, event=1, this=<optimized out>) at I_Continuation.h:187
   No locals.
   #8  EThread::process_event (this=this@entry=0x2ae11421a010, e=e@entry=0x2aec753d6b80, calling_code=1) at UnixEThread.cc:131
           c_temp = <optimized out>
           lock = {m = {m_ptr = 0x1cf88f0}, lock_acquired = true}
   #9  0x000000000076594e in EThread::process_queue (this=this@entry=0x2ae11421a010, NegativeQueue=NegativeQueue@entry=0x2ae116967e70,
       ev_count=ev_count@entry=0x2ae116967e6c, nq_count=nq_count@entry=0x2ae116967e68) at UnixEThread.cc:170
           e = 0x2aec753d6b80
   #10 0x0000000000765f38 in EThread::execute_regular (this=0x2ae11421a010) at UnixEThread.cc:230
           done_one = <optimized out>
           sleep_time = <optimized out>
           e = <optimized out>
           NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0}
           next_time = <optimized out>
           delta = <optimized out>
           prev_metric = 0x2ae11431c550
           nq_count = 340
           ev_count = 341
           METRIC_INIT = {_loop_time = {_start = 0, _min = 9223372036854775807, _max = 0}, _events = {_min = 2147483647, _max = 0, _total = 0},
             _count = 0, _wait = 0}
   #11 0x0000000000764a7a in spawn_thread_internal (a=0x18f2f60) at Thread.cc:85
           p = 0x18f2f60
   #12 0x00002ae10c589e65 in start_thread () from /lib64/libpthread.so.0
   No symbol table info available.
   #13 0x00002ae10d2bf88d in clone () from /lib64/libc.so.6
   No symbol table info available.
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bdgranger commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
bdgranger commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-800532421


   @masaori335 @shinrich we are still seeing crashes near the tip of the 8.1.x branch. They are not all related to Http2, but they do all go through either add_to_keep_alive_queue or manage_keep_alive_queue and in every case it appears that the m_ptr of the mutex is null.
   
   @shinrich we do have the code from #4936 that was to address #4504. However, we do not have the assert code from #4379, which is also not in the 8.1.x branch. It seems like it would be good to pick that up.
   
   Over the past week or so in this production environment, we have seen more than a dozen crashes spread across multiple servers, although not all the servers in the deployment. It appears to happen only under production load. The crashes fall fairly evenly across 4 call stacks, all of which eventually end up trying to add to or manage the keep_alive queue:
   
   * Http2Stream::destroy => Http2ConnectionState::release_stream => add_to_keep_alive_queue
   * InactivityCop::check_inactivity => manage_keep_alive_queue
   * Http1ClientSession::start => add_to_keep_alive_queue
   * Http1ClientSession::release => add_to_keep_alive_queue
   
   Will add some example stack traces and one history dump below.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
shinrich commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-797090934


   PR #4379 addressed adding an assert to catch when this code was being entered without being locked.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bdgranger commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
bdgranger commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-800539475


   The bt from the InactivityCop crash:
   
   ```c++
   (gdb) bt
   #0 0x000000000071883a in Mutex_trylock (t=0x2b54d4541010, m=<optimized out>) at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:287
   #1 MutexTryLock (t=0x2b54d4541010, am=..., this=0x2b54d6a36c90) at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:555
   #2 _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>, handle_event=<synthetic pointer>, now=1615761671786775338,
   vc=<optimized out>, this=0x2b54d4544cf0) at UnixNet.cc:648
   #3 NetHandler::manage_keep_alive_queue (this=0x2b54d4544cf0) at UnixNet.cc:626
   #4 0x0000000000719372 in InactivityCop::check_inactivity (this=0x2b55080008c0, event=<optimized out>, e=<optimized out>) at UnixNet.cc:103
   #5 0x0000000000760912 in handleEvent (data=0x1abbda0, event=2, this=<optimized out>) at I_Continuation.h:160
   #6 EThread::process_event (this=this@entry=0x2b54d4541010, e=e@entry=0x1abbda0, calling_code=2) at UnixEThread.cc:131
   #7 0x0000000000761b33 in EThread::execute_regular (this=0x2b54d4541010) at UnixEThread.cc:244
   #8 0x000000000076030a in spawn_thread_internal (a=0x1ab6da0) at Thread.cc:85
   #9 0x00002b54cccb5e65 in start_thread () from /lib64/libpthread.so.0
   #10 0x00002b54cd9eb88d in clone () from /lib64/libc.so.6
   ```
   
   Still trying to download the core to get more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bdgranger edited a comment on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
bdgranger edited a comment on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-800532421


   @masaori335 @shinrich we are still seeing crashes near the tip of the 8.1.x branch. They are not all related to Http2, but they do all go through either add_to_keep_alive_queue or manage_keep_alive_queue and in every case it appears that the m_ptr of the mutex is null.
   
   @shinrich we do have the code from #4936 that was to address #4504. However, we do not have the assert code from #4379, which is also not in the 8.1.x branch. It seems like it would be good to pick that up.
   
   Over the past week or so in this production environment, we have seen more than a dozen crashes spread across multiple servers, although not all the servers in the deployment. It appears to happen only under production load. The crashes fall fairly evenly across 4 call stacks, all of which eventually end up trying to add to or manage the keep_alive queue:
   
   * Http2Stream::destroy => Http2ConnectionState::release_stream => add_to_keep_alive_queue
   * InactivityCop::check_inactivity => manage_keep_alive_queue
   * Http1ClientSession::start => add_to_keep_alive_queue
   * Http1ClientSession::release => add_to_keep_alive_queue
   
   Will add some example stack traces and one history dump below.
   
   Please advise if there is some specific information I should be trying to pull out of these core files (takes a while to download them) to narrow down what might be happening.
   
   Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
shinrich commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-800683984


   Looking through our internal jira's, it looks like we battled through with this issue in Feb 2019.  @a-canary put up PR #5056 to address the issue which I closed for some reason.  
   
   Looking for PR's that address a similar issue around the same time period, I ran across PR #6396 which is in the 9 branch but not the 8.1 branch.  You might give that a try.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
shinrich commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-795768363


   There have also been several fixes addressing locking around the active queue/keep alive queues.  I am not sure which of those fixes have been back ported to 8.x.  I have only been tracking 9.0.x


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
masaori335 commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-872597586


   @ezelkow1 Another candidate for 8.1 backports ( #5952 and #6686 )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
shinrich commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-800691404


   More related seems PR #5120. Based on my git log search doesn't seem to have landed on 8.1.
   
   That PR also references PR #5089 which does not seem to be on the 8.1 branch either.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bdgranger commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
bdgranger commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-800536478


   An example of the Http1ClientSession::release crash:
   
   ```c++
   (gdb) bt
   #0  0x0000000000718022 in Mutex_trylock (t=0x2b7d8f341010, m=<optimized out>)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:287
   #1  MutexTryLock (t=0x2b7d8f341010, am=..., this=0x2b7d95ac7a60)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:555
   #2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
       handle_event=<synthetic pointer>, now=1615727914286925795, vc=<optimized out>, this=0x2b7d8f344cf0) at UnixNet.cc:648
   #3  manage_keep_alive_queue (this=0x2b7d8f344cf0) at UnixNet.cc:626
   #4  NetHandler::add_to_keep_alive_queue (this=0x2b7d8f344cf0, vc=<optimized out>) at UnixNet.cc:698
   #5  0x00000000004e63c5 in Http1ClientSession::release (this=0x2b7daa9663c0, trans=0x2b7daa9666f0) at Http1ClientSession.cc:485
   #6  0x00000000006c3c5d in ProxyClientTransaction::release (this=0x2b7daa9666f0, r=<optimized out>)
       at ProxyClientTransaction.cc:71
   #7  0x00000000004f6540 in HttpSM::tunnel_handler_ua (this=0x2b866faebd80, event=103, c=0x2b866faeccc0) at HttpSM.cc:3274
   #8  0x000000000055bd61 in HttpTunnel::consumer_handler (this=this@entry=0x2b866faecc80, event=103, c=0x2b866faeccc0)
       at HttpTunnel.cc:1389
   #9  0x000000000055c214 in HttpTunnel::main_handler (this=0x2b866faecc80, event=<optimized out>, data=<optimized out>)
       at HttpTunnel.cc:1678
   #10 0x000000000048c8f2 in handleEvent (data=0x2b81106a4528, event=<optimized out>, this=<optimized out>)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Continuation.h:160
   #11 write_signal_and_update (vc=0x2b81106a4320, event=<optimized out>) at UnixNetVConnection.cc:114
   #12 write_signal_done (event=<optimized out>, nh=0x2b7d8f344cf0, vc=0x2b81106a4320) at UnixNetVConnection.cc:156
   #13 0x000000000072ca34 in write_to_net_io (nh=nh@entry=0x2b7d8f344cf0, vc=0x2b81106a4320, thread=<optimized out>)
       at UnixNetVConnection.cc:523
   #14 0x000000000072ccd8 in write_to_net (nh=nh@entry=0x2b7d8f344cf0, vc=vc@entry=0x2b81106a4320, thread=<optimized out>)
       at UnixNetVConnection.cc:368
   #15 0x00000000007154cf in NetHandler::process_ready_list (this=this@entry=0x2b7d8f344cf0) at UnixNet.cc:416
   #16 0x000000000071578d in NetHandler::waitForActivity (this=0x2b7d8f344cf0, timeout=<optimized out>) at UnixNet.cc:532
   #17 0x0000000000761d09 in EThread::execute_regular (this=0x2b7d8f341010) at UnixEThread.cc:278
   #18 0x000000000076030a in spawn_thread_internal (a=0x17b3b40) at Thread.cc:85
   #19 0x00002b7d8b976e65 in start_thread () from /lib64/libpthread.so.0
   #20 0x00002b7d8c6ac88d in clone () from /lib64/libc.so.6
   ```
   
   bt full:
   ```c++
   (gdb) bt full
   #0  0x0000000000718022 in Mutex_trylock (t=0x2b7d8f341010, m=<optimized out>)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:287
   No locals.
   #1  MutexTryLock (t=0x2b7d8f341010, am=..., this=0x2b7d95ac7a60)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:555
   No locals.
   #2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
       handle_event=<synthetic pointer>, now=1615727914286925795, vc=<optimized out>, this=0x2b7d8f344cf0) at UnixNet.cc:648
           lock = {m = {m_ptr = 0x0}, lock_acquired = false}
           diff = <optimized out>
   #3  manage_keep_alive_queue (this=0x2b7d8f344cf0) at UnixNet.cc:626
           vc = 0x2b80f4393400
           total_connections_in = <optimized out>
           vc_next = 0x0
           closed = 0
           total_idle_time = 0
           handle_event = 0
           total_idle_count = 0
   #4  NetHandler::add_to_keep_alive_queue (this=0x2b7d8f344cf0, vc=<optimized out>) at UnixNet.cc:698
           __FUNCTION__ = "add_to_keep_alive_queue"
   #5  0x00000000004e63c5 in Http1ClientSession::release (this=0x2b7daa9663c0, trans=0x2b7daa9666f0) at Http1ClientSession.cc:485
           more_to_read = false
   #6  0x00000000006c3c5d in ProxyClientTransaction::release (this=0x2b7daa9666f0, r=<optimized out>)
       at ProxyClientTransaction.cc:71
           __FUNCTION__ = "release"
   #7  0x00000000004f6540 in HttpSM::tunnel_handler_ua (this=0x2b866faebd80, event=103, c=0x2b866faeccc0) at HttpSM.cc:3274
           close_connection = false
           p = <optimized out>
           selfc = <optimized out>
           __FUNCTION__ = "tunnel_handler_ua"
           original_source = <optimized out>
   #8  0x000000000055bd61 in HttpTunnel::consumer_handler (this=this@entry=0x2b866faecc80, event=103, c=0x2b866faeccc0)
       at HttpTunnel.cc:1389
           sm_callback = false
           jump_point = <optimized out>
           p = 0x2b866faece80
           __FUNCTION__ = "consumer_handler"
   #9  0x000000000055c214 in HttpTunnel::main_handler (this=0x2b866faecc80, event=<optimized out>, data=<optimized out>)
       at HttpTunnel.cc:1678
           p = <optimized out>
           c = <optimized out>
           sm_callback = false
   #10 0x000000000048c8f2 in handleEvent (data=0x2b81106a4528, event=<optimized out>, this=<optimized out>)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Continuation.h:160
   No locals.
   #11 write_signal_and_update (vc=0x2b81106a4320, event=<optimized out>) at UnixNetVConnection.cc:114
   No locals.
   #12 write_signal_done (event=<optimized out>, nh=0x2b7d8f344cf0, vc=0x2b81106a4320) at UnixNetVConnection.cc:156
   No locals.
   #13 0x000000000072ca34 in write_to_net_io (nh=nh@entry=0x2b7d8f344cf0, vc=0x2b81106a4320, thread=<optimized out>)
       at UnixNetVConnection.cc:523
           wbe_event = <optimized out>
           e = <optimized out>
           s = 0x2b81106a4520
           lock = {m = {m_ptr = 0x2b7de2fcf410}, lock_acquired = true}
           r_ip = "79.27.137.59\000\000\000"
           buf = @0x2b81106a4548: {mbuf = 0x2b8668177630, entry = 0x2b8668177670}
           towrite = 105066
           signalled = <optimized out>
           needs = -2147483644
           total_written = 105066
           r = 105066
           __FUNCTION__ = "write_to_net_io"
   #14 0x000000000072ccd8 in write_to_net (nh=nh@entry=0x2b7d8f344cf0, vc=vc@entry=0x2b81106a4320, thread=<optimized out>)
       at UnixNetVConnection.cc:368
   No locals.
   #15 0x00000000007154cf in NetHandler::process_ready_list (this=this@entry=0x2b7d8f344cf0) at UnixNet.cc:416
           vc = <optimized out>
   #16 0x000000000071578d in NetHandler::waitForActivity (this=0x2b7d8f344cf0, timeout=<optimized out>) at UnixNet.cc:532
           epd = <optimized out>
           lock = {m = {m_ptr = 0x2b7d90002d60}, locked_p = true}
           pd = 0x2b7d95f12010
           vc = <optimized out>
           __FUNCTION__ = "waitForActivity"
   #17 0x0000000000761d09 in EThread::execute_regular (this=0x2b7d8f341010) at UnixEThread.cc:278
           done_one = <optimized out>
           sleep_time = <optimized out>
           e = <optimized out>
           NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0}
           next_time = <optimized out>
           delta = <optimized out>
           prev_metric = <optimized out>
           nq_count = 2
           ev_count = 2
           METRIC_INIT = {_loop_time = {_start = 0, _min = 9223372036854775807, _max = 0}, _events = {_min = 2147483647, _max = 0,
               _total = 0}, _count = 0, _wait = 0}
   #18 0x000000000076030a in spawn_thread_internal (a=0x17b3b40) at Thread.cc:85
           p = 0x17b3b40
   #19 0x00002b7d8b976e65 in start_thread () from /lib64/libpthread.so.0
   No symbol table info available.
   #20 0x00002b7d8c6ac88d in clone () from /lib64/libc.so.6
   No symbol table info available.
   ```
   HttpSM history (from frame 7):
   ```c++
   (gdb) frame 7
   #7  0x00000000004f6540 in HttpSM::tunnel_handler_ua (this=0x2b866faebd80, event=103, c=0x2b866faeccc0) at HttpSM.cc:3274
   3274        ua_txn->release(ua_buffer_reader);
   (gdb) print history
   $1 = {history = {{location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781500 <HttpSM::state_read_client_request_header(int, void*)::__FUNCTION__> "state_read_client_request_header",
           line = 586}, event = 100, reentrancy = 2}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781390 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1290},
         event = 60000, reentrancy = 3}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781370 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1330}, event = 60000,
         reentrancy = 3}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781390 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1290},
         event = 60000, reentrancy = 4}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781370 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1330}, event = 60000,
         reentrancy = 4}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781390 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1290},
         event = 60000, reentrancy = 5}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781370 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1330}, event = 60000,
         reentrancy = 5}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781390 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1290},
         event = 60000, reentrancy = 6}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781370 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1330}, event = 60000,
         reentrancy = 6}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x780918 <HttpSM::set_next_state()::__FUNCTION__> "set_next_state", line = 7308}, event = 34463, reentrancy = 6},
       {location = {file = 0x77d5de "HttpSM.cc", func = 0x780918 <HttpSM::set_next_state()::__FUNCTION__> "set_next_state",
           line = 7348}, event = 34463, reentrancy = 6}, {location = {file = 0x78cc60 "HttpCacheSM.cc",
           func = 0x78ceb0 <HttpCacheSM::state_cache_open_read(int, void*)::__FUNCTION__> "state_cache_open_read", line = 116},
         event = 1102, reentrancy = -31073}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x7811a0 <HttpSM::state_cache_open_read(int, void*)::__FUNCTION__> "state_cache_open_read", line = 2474},
         event = 1102, reentrancy = 7}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781390 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1290},
         event = 60000, reentrancy = 8}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781370 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1330}, event = 60000,
         reentrancy = 8}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781390 <HttpSM::state_api_callback(int, void*)::__FUNCTION__> "state_api_callback", line = 1290},
         event = 60000, reentrancy = 9}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781370 <HttpSM::state_api_callout(int, void*)::__FUNCTION__> "state_api_callout", line = 1330}, event = 60000,
         reentrancy = 9}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x780b90 <HttpSM::setup_cache_read_transfer()::__FUNCTION__> "setup_cache_read_transfer", line = 6103},
         event = 34463, reentrancy = 9}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781020 <HttpSM::tunnel_handler_cache_read(int, HttpTunnelProducer*)::__FUNCTION__> "tunnel_handler_cache_read",
           line = 3325}, event = 102, reentrancy = 6}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x780db0 <HttpSM::do_cache_lookup_and_read()::__FUNCTION__> "do_cache_lookup_and_read", line = 4532}, event = 0,
         reentrancy = 6}, {location = {file = 0x77d5de "HttpSM.cc",
           func = 0x781060 <HttpSM::tunnel_handler_ua(int, HttpTunnelConsumer*)::__FUNCTION__> "tunnel_handler_ua", line = 3156},
         event = 103, reentrancy = 0}, {location = {file = 0x0, func = 0x0, line = 0}, event = 0,
         reentrancy = 0} <repeats 44 times>}, history_pos = 21}
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bdgranger commented on issue #7581: Sporadic crash and core dump in Mutex_trylock

Posted by GitBox <gi...@apache.org>.
bdgranger commented on issue #7581:
URL: https://github.com/apache/trafficserver/issues/7581#issuecomment-800538431


   An example of the Http2 stream_release crash:
   
   ```c++
   (gdb) bt
   #0  0x0000000000718022 in Mutex_trylock (t=0x2b9c77445010, m=<optimized out>)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:287
   #1  MutexTryLock (t=0x2b9c77445010, am=..., this=0x2b9c7df4db60)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:555
   #2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
       handle_event=<synthetic pointer>, now=1615767053666485882, vc=<optimized out>, this=0x2b9c77448cf0) at UnixNet.cc:648
   #3  manage_keep_alive_queue (this=0x2b9c77448cf0) at UnixNet.cc:626
   #4  NetHandler::add_to_keep_alive_queue (this=0x2b9c77448cf0, vc=<optimized out>) at UnixNet.cc:698
   #5  0x00000000005854c2 in Http2ConnectionState::release_stream (this=this@entry=0x2b9ccb364f78,
       stream=stream@entry=0x2ba7d54626c0) at Http2ConnectionState.cc:1403
   #6  0x00000000005924df in Http2Stream::destroy (this=0x2ba7d54626c0) at Http2Stream.cc:784
   #7  0x000000000058ea8f in Http2Stream::terminate_if_possible (this=this@entry=0x2ba7d54626c0) at Http2Stream.cc:414
   #8  0x0000000000591a37 in Http2Stream::main_event_handler (this=0x2ba7d54626c0, event=<optimized out>, edata=<optimized out>)
       at Http2Stream.cc:149
   #9  0x0000000000760912 in handleEvent (data=0x2ba7d9fdd340, event=103, this=<optimized out>) at I_Continuation.h:160
   #10 EThread::process_event (this=this@entry=0x2b9c77445010, e=e@entry=0x2ba7d9fdd340, calling_code=103) at UnixEThread.cc:131
   #11 0x00000000007611de in EThread::process_queue (this=this@entry=0x2b9c77445010,
       NegativeQueue=NegativeQueue@entry=0x2b9c7df4de70, ev_count=ev_count@entry=0x2b9c7df4de6c,
       nq_count=nq_count@entry=0x2b9c7df4de68) at UnixEThread.cc:170
   #12 0x00000000007617c8 in EThread::execute_regular (this=0x2b9c77445010) at UnixEThread.cc:230
   #13 0x000000000076030a in spawn_thread_internal (a=0x187b5d0) at Thread.cc:85
   #14 0x00002b9c7331fe65 in start_thread () from /lib64/libpthread.so.0
   #15 0x00002b9c7405588d in clone () from /lib64/libc.so.6
   ```
   
   full bt:
   ```c++
   (gdb) bt full
   #0  0x0000000000718022 in Mutex_trylock (t=0x2b9c77445010, m=<optimized out>)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:287
   No locals.
   #1  MutexTryLock (t=0x2b9c77445010, am=..., this=0x2b9c7df4db60)
       at /usr/src/debug/trafficserver-8.1.2/iocore/eventsystem/I_Lock.h:555
   No locals.
   #2  _close_vc (total_idle_count=<synthetic pointer>, total_idle_time=<synthetic pointer>, closed=<synthetic pointer>,
       handle_event=<synthetic pointer>, now=1615767053666485882, vc=<optimized out>, this=0x2b9c77448cf0) at UnixNet.cc:648
           lock = {m = {m_ptr = 0x0}, lock_acquired = true}
           diff = <optimized out>
   #3  manage_keep_alive_queue (this=0x2b9c77448cf0) at UnixNet.cc:626
           vc = 0x2b9fee0b4000
           total_connections_in = <optimized out>
           vc_next = 0x0
           closed = 0
           total_idle_time = 0
           handle_event = 0
           total_idle_count = 0
   #4  NetHandler::add_to_keep_alive_queue (this=0x2b9c77448cf0, vc=<optimized out>) at UnixNet.cc:698
           __FUNCTION__ = "add_to_keep_alive_queue"
   #5  0x00000000005854c2 in Http2ConnectionState::release_stream (this=this@entry=0x2b9ccb364f78,
       stream=stream@entry=0x2ba7d54626c0) at Http2ConnectionState.cc:1403
           vc = <optimized out>
           __FUNCTION__ = "release_stream"
           lock = {m = {m_ptr = 0x2ba2b7624950}, locked_p = true}
   #6  0x00000000005924df in Http2Stream::destroy (this=0x2ba7d54626c0) at Http2Stream.cc:784
           h2_proxy_ssn = 0x2b9ccb364cb0
           lock = {m = {m_ptr = 0x2ba2b7624950}, locked_p = true}
           __FUNCTION__ = "destroy"
           lock = {m = {m_ptr = 0x2b9fed9c5670}, locked_p = true}
           cid = 23156238
   #7  0x000000000058ea8f in Http2Stream::terminate_if_possible (this=this@entry=0x2ba7d54626c0) at Http2Stream.cc:414
           h2_proxy_ssn = 0x2b9ccb364cb0
           lock = {m = {m_ptr = 0x2ba2b7624950}, locked_p = true}
           __FUNCTION__ = "terminate_if_possible"
   #8  0x0000000000591a37 in Http2Stream::main_event_handler (this=0x2ba7d54626c0, event=<optimized out>, edata=<optimized out>)
       at Http2Stream.cc:149
           lock = {m = {m_ptr = 0x2b9fed9c5670}, locked_p = true}
           __FUNCTION__ = "main_event_handler"
           e = <optimized out>
   #9  0x0000000000760912 in handleEvent (data=0x2ba7d9fdd340, event=103, this=<optimized out>) at I_Continuation.h:160
   No locals.
   #10 EThread::process_event (this=this@entry=0x2b9c77445010, e=e@entry=0x2ba7d9fdd340, calling_code=103) at UnixEThread.cc:131
           c_temp = <optimized out>
           lock = {m = {m_ptr = 0x2b9fed9c5670}, lock_acquired = true}
   #11 0x00000000007611de in EThread::process_queue (this=this@entry=0x2b9c77445010,
       NegativeQueue=NegativeQueue@entry=0x2b9c7df4de70, ev_count=ev_count@entry=0x2b9c7df4de6c,
       nq_count=nq_count@entry=0x2b9c7df4de68) at UnixEThread.cc:170
           e = 0x2ba7d9fdd340
   #12 0x00000000007617c8 in EThread::execute_regular (this=0x2b9c77445010) at UnixEThread.cc:230
           done_one = <optimized out>
           sleep_time = <optimized out>
           e = <optimized out>
           NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x0}, tail = 0x0}
           next_time = <optimized out>
           delta = <optimized out>
           prev_metric = 0x2b9c7754bf30
           nq_count = 184
           ev_count = 185
           METRIC_INIT = {_loop_time = {_start = 0, _min = 9223372036854775807, _max = 0}, _events = {_min = 2147483647, _max = 0,
               _total = 0}, _count = 0, _wait = 0}
   #13 0x000000000076030a in spawn_thread_internal (a=0x187b5d0) at Thread.cc:85
           p = 0x187b5d0
   #14 0x00002b9c7331fe65 in start_thread () from /lib64/libpthread.so.0
   No symbol table info available.
   #15 0x00002b9c7405588d in clone () from /lib64/libc.so.6
   No symbol table info available.
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org