You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by GitBox <gi...@apache.org> on 2020/10/12 02:20:21 UTC

[GitHub] [trafficserver] calavera commented on issue #7255: Deadlock in UnixNetVConnection with 9.0.x and http 1

calavera commented on issue #7255:
URL: https://github.com/apache/trafficserver/issues/7255#issuecomment-706818279


   I was wrong, it looks like it happens during SSN_START, after the hook is triggered. The event is a TS_EVENT_HTTP_CONTINUE triggered with TSHttpSsnReenable.
   
   I changed our plugin to not go through trafficserver for that http request, going directly to the origin and not caching anything. It still fails in the same way, so that's not what's causing the problem.
   
   I got this stacktrace in gdb:
   
   ```
   (gdb) backtrace
   #0  0x00007ffff752318b in raise () from /lib/x86_64-linux-gnu/libc.so.6
   #1  0x00007ffff7502859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
   #2  0x00007ffff7f41a5a in ink_abort (message_format=message_format@entry=0x7ffff7fa3b6f "%s:%d: failed assertion `%s`") at ink_error.cc:99
   #3  0x00007ffff7f3ef24 in _ink_assert (expression=expression@entry=0x5555559d7110 "!\"BUG: It must have acquired the NetHandler's lock before doing anything on keep_alive_queue.\"", 
       file=file@entry=0x5555559d6b63 "UnixNetVConnection.cc", line=line@entry=1449) at ink_assert.cc:37
   #4  0x000055555593d321 in UnixNetVConnection::add_to_keep_alive_queue (this=0x7fff70027b20) at UnixNetVConnection.cc:1449
   #5  0x00005555556b70a9 in Http1ClientSession::release (this=0x7fffe81f5bc0, trans=0x7fffe81f5f38) at Http1ClientSession.cc:419
   #6  0x00005555556b56ed in Http1ClientSession::start (this=<optimized out>) at Http1ClientSession.cc:509
   #7  0x00005555558e03e9 in ProxySession::handle_api_return (this=this@entry=0x7fffe81f5bc0, event=event@entry=60000) at ProxySession.cc:166
   #8  0x00005555558e0b81 in ProxySession::state_api_callout (this=0x7fffe81f5bc0, event=60000, data=<optimized out>) at ProxySession.cc:121
   #9  0x0000555555680dcb in Continuation::handleEvent (data=0x0, event=60000, this=0x7fffe81f5bc0) at ./iocore/eventsystem/I_Continuation.h:190
   #10 TSHttpSsnCallback::event_handler (this=0x7fff7c002260) at traffic_server/InkAPI.cc:4906
   #11 0x0000555555978d4a in Continuation::handleEvent (data=0x555555cb7f80, event=1, this=0x7fff7c002260) at I_Continuation.h:190
   #12 EThread::process_event (this=this@entry=0x7ffff66ab010, e=e@entry=0x555555cb7f80, calling_code=1) at UnixEThread.cc:127
   #13 0x00005555559793de in EThread::process_queue (this=this@entry=0x7ffff66ab010, NegativeQueue=NegativeQueue@entry=0x7ffff5fdcde0, ev_count=ev_count@entry=0x7ffff5fdcdd4, 
       nq_count=nq_count@entry=0x7ffff5fdcdd0) at UnixEThread.cc:162
   #14 0x00005555559798a8 in EThread::execute_regular (this=this@entry=0x7ffff66ab010) at UnixEThread.cc:222
   #15 0x000055555597a22f in EThread::execute (this=0x7ffff66ab010) at UnixEThread.cc:327
   #16 0x00005555559782bb in spawn_thread_internal (a=0x555555caf890) at Thread.cc:92
   #17 0x00007ffff7a2b609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
   #18 0x00007ffff75ff293 in clone () from /lib/x86_64-linux-gnu/libc.so.6
   ```
   
   These are some more logs that I captured with my debug version:
   
   ```
   [Oct 12 02:11:32.972] [ET_NET 1] DEBUG: <SSLNetVConnection.cc:1288 (sslServerHandShakeEvent)> (ssl) SSL server handshake completed successfully
   [Oct 12 02:11:32.972] [ET_NET 1] DEBUG: <SSLNetVConnection.cc:1302 (sslServerHandShakeEvent)> (ssl) ssl handshake time:130845435
   [Oct 12 02:11:32.972] [ET_NET 1] DEBUG: <SSLNetVConnection.cc:1325 (sslServerHandShakeEvent)> (ssl) client selected next protocol 'http/1.1'
   [Oct 12 02:11:32.972] [ET_NET 1] DEBUG: <SSLNetVConnection.cc:632 (net_read_io)> (ssl) ssl handshake EVENT_DONE ntodo=0
   [Oct 12 02:11:32.972] [ET_NET 1] DEBUG: <Http1ClientSession.cc:148 (new_connection)> (ssl_early_data) read_from_early_data = 0
   [Oct 12 02:11:32.972] [ET_NET 1] DEBUG: <Http1ClientSession.cc:193 (new_connection)> (http_cs) [1] session born, netvc 0x7fff70027b20
   [Oct 12 02:11:32.972] [ET_NET 1] DIAG: (netlify_traffic_mesh_resolve) Handling session event event=60013
   [Oct 12 02:11:32.972] [ET_NET 1] DIAG: (netlify_traffic_mesh_resolve) Resolving bundle...                          host=votelify.edge-handlers.dev ssn_start=true
   [Oct 12 02:11:32.972] [ET_NET 1] DIAG: (netlify_resolve_async) Resolving bundle...                          host=votelify.edge-handlers.dev
   [New Thread 0x7fffa467f700 (LWP 247)]
   [New Thread 0x7fffa447e700 (LWP 248)]
   [New Thread 0x7fffa427d700 (LWP 249)]
   [New Thread 0x7fff844a4700 (LWP 250)]
   [New Thread 0x7fff842a3700 (LWP 251)]
   [New Thread 0x7fff7bfff700 (LWP 252)]
   [Oct 12 02:11:32.978] mesh-tokio-work DIAG: (netlify_resolve_async) Bundle resolution done
   [Oct 12 02:11:32.978] mesh-tokio-work DIAG: (netlify_traffic_mesh_resolve) Bundle resolution async completed found=true
   [Oct 12 02:11:32.978] [ET_NET 2] DEBUG: <Http1ClientSession.cc:411 (release)> (http_cs) [1] initiating io for next header
   Fatal: UnixNetVConnection.cc:1449: failed assertion `!"BUG: It must have acquired the NetHandler's lock before doing anything on keep_alive_queue."`
   --Type <RET> for more, q to quit, c to continue without paging--RET
   
   Thread 5 "[ET_NET 2]" received signal SIGABRT, Aborted.
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org