You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by GitBox <gi...@apache.org> on 2020/09/13 23:14:25 UTC

[GitHub] [trafficserver] sudheerv opened a new issue #6849: ssl_read_from_net assert failure for null buffer block

sudheerv opened a new issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849


   Noticing the below crash in prod on our first attempt to test 9.0
   
   ```
   (gdb) bt
   #0  0x00002aaec7fe24ad in vfprintf () from /lib64/libc.so.6
   #1  0x00002aaec7fed287 in fprintf () from /lib64/libc.so.6
   #2  0x00002aaec569011d in fatal_va(const char *, const char *, typedef __va_list_tag __va_list_tag *) (hdr=hdr@entry=0x2aaec570dafd "Fatal: ", fmt=fmt@entry=0x2aaec570d2a7 "%s:%d: failed assertion `%s`", ap=ap@entry=0x2aaed9104858) at ink_error.cc:47
   #3  0x00002aaec5690356 in ink_abort (message_format=message_format@entry=0x2aaec570d2a7 "%s:%d: failed assertion `%s`") at ink_error.cc:96
   #4  0x00002aaec568d045 in _ink_assert (expression=expression@entry=0x8210bd "current_block != nullptr", file=file@entry=0x820e40 "SSLNetVConnection.cc", line=line@entry=271) at ink_assert.cc:37
   #5  0x0000000000742dc3 in ssl_read_from_net (ret=<synthetic pointer>, lthread=<optimized out>, sslvc=0x2aaf5f602340) at SSLNetVConnection.cc:271
   #6  SSLNetVConnection::net_read_io (this=0x2aaf5f602340, nh=0x2aaecd809d80, lthread=<optimized out>) at SSLNetVConnection.cc:662
   #7  0x0000000000761b48 in NetHandler::process_ready_list (this=this@entry=0x2aaecd809d80) at UnixNet.cc:412
   #8  0x0000000000761e3d in NetHandler::waitForActivity (this=0x2aaecd809d80, timeout=<optimized out>) at UnixNet.cc:547
   #9  0x00000000007c02fa in EThread::execute_regular (this=this@entry=0x2aaecd806000) at UnixEThread.cc:266
   #10 0x00000000007c05c2 in EThread::execute (this=0x2aaecd806000) at UnixEThread.cc:327
   #11 0x00000000007be969 in spawn_thread_internal (a=0x2aaec9565180) at Thread.cc:92
   #12 0x00002aaec72e6dd5 in start_thread () from /lib64/libpthread.so.0
   #13 0x00002aaec8097ead in clone () from /lib64/libc.so.6
   (gdb) 
   (gdb) p (*(IOBufferBlock *) 0x2aaf619e0880).next.m_ptr 
   $15 = (IOBufferBlock *) 0x2aaf2a5ad480
   (gdb) p (*(*(IOBufferBlock *) 0x2aaf619e0880).next.m_ptr)
   $12 = {
     <RefCountObj> = {
       <ForceVFPTToTop> = {
         _vptr.ForceVFPTToTop = 0x7c5b68 <vtable for IOBufferBlock+16>
       }, 
       members of RefCountObj: 
       m_refcount = 1
     }, 
     members of IOBufferBlock: 
     _start = 0x0, 
     _end = 0x0, 
     _buf_end = 0x3532254332353215 <Address 0x3532254332353215 out of bounds>, 
     _location = 0x7e0a20 "memory/IOBuffer/HttpSM.cc:5840", 
     data = {
       m_ptr = 0x2aaed5736d80
     }, 
     next = {
       m_ptr = 0x0
     }
   }
   (gdb) p (*(*(IOBufferBlock *) 0x2aaf619e0880).next.m_ptr)._start
   $17 = 0x0
   (gdb) p (*(*(IOBufferBlock *) 0x2aaf619e0880).next.m_ptr)._buf_end
   $18 = 0x3532254332353215 <Address 0x3532254332353215 out of bounds>
   (gdb) p (*(*(IOBufferBlock *) 0x2aaf619e0880).next.m_ptr)._end
   $13 = 0x0
   (gdb) 
   (gdb) p (*(*(IOBufferBlock *) 0x2aaf619e0880).next.m_ptr).next.m_ptr
   $11 = (IOBufferBlock *) 0x0
   (gdb) 
   
   (gdb) p *(*(IOBufferBlock *) 0x2aaf619e0880).next.m_ptr.data.m_ptr
   $4 = {
     <RefCountObj> = {
       <ForceVFPTToTop> = {
         _vptr.ForceVFPTToTop = 0x7c5b40 <vtable for IOBufferData+16>
       }, 
       members of RefCountObj: 
       m_refcount = 1
     }, 
     members of IOBufferData: 
     _size_index = 3833167203381096996, 
     _mem_type = DEFAULT_ALLOC, 
     _data = 0x0, 
     _location = 0x7e0a20 "memory/IOBuffer/HttpSM.cc:5840"
   }
   
   
   (gdb) p *(*(IOBufferBlock *) 0x2aaf619e0880).data.m_ptr
   $5 = {
     <RefCountObj> = {
       <ForceVFPTToTop> = {
         _vptr.ForceVFPTToTop = 0x7c5b40 <vtable for IOBufferData+16>
       }, 
       members of RefCountObj: 
       m_refcount = 2
     }, 
     members of IOBufferData: 
     _size_index = 5, 
     _mem_type = DEFAULT_ALLOC, 
     _data = 0x2aaf60059000 "fZEERBbg3lcGkSgSxwSEiO8Ho; sl=\"v=1&sGsuF\"; lidc=\"b=OB50:s=O:r=O:g=2299:u=15:i=1591116679:t=1591203079:v=1:sig=AQE4cHsiP5eLt8aPshPpFX0rbvF3arjp\"\r\nHost: www.linkedin.com\r\nUser-Agent: Mozilla/5.0 (iPhone"..., 
     _location = 0x81cc70 "memory/IOBuffer/ProtocolProbeSessionAccept.cc:59"
   }
   
   
   
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 edited a comment on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
masaori335 edited a comment on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-640297605


   Nice catch! ~~Does this mean bad error handing of calling `buffer_size_to_index()` in this case?
   I don't object bumping the default value or removing it, but even if we bump the default value or explicitly set the `max`, the `size` could be larger than it and the function returns an error.~~
   
   ----
   Update: It seems a factor of this is `buffer_size_to_index()` returns invalid index even if given `max` is out of range (0 - 14). I wonder that `buffer_size_to_index()` should return `DEFAULT_BUFFER_SIZES`  or `-1` as error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] sudheerv commented on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
sudheerv commented on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-641021006


   Yeah, I double checked but `max_iobuffer_size` doesn't seem to get corrupted during start up. It seems to get corrupted sometime during the run as otherwise, it'd crash much more frequently or even all the time. After a few fixes and most likely reverting PR 4028 seems to have solved this.
   
   Additionally, https://github.com/apache/trafficserver/pull/6869 forces the caller to always pass a max iobuf size and not rely on the global default at all. Allowing the callers to pass the max explicitly allows more flexibility in customizing the max for different use cases (for e.g some users may set high HostDB buffer size while some may set high POST buffer size).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] sudheerv commented on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
sudheerv commented on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-649532348


   > Could you add an assert to make sure the `max_iobuf_size` is in bewteen 0 and MAX_BUFFER_SIZE_INDEX (14) at the startup?
   > 
   > ```
   >    max_iobuffer_size = buffer_size_to_index(config_max_iobuffer_size, DEFAULT_BUFFER_SIZES - 1);
   > +  ink_release_assert(0 <= max_iobuffer_size && max_iobuffer_size <= MAX_BUFFER_SIZE_INDEX);
   > ```
   
   @masaori335 Just fyi, It looks like the root cause of the corruption of globals is #6950 - finally was able to run ASAN that could catch this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] sudheerv edited a comment on issue #6849: Crash in ssl_read_from_net due to assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
sudheerv edited a comment on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-638493946






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 edited a comment on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
masaori335 edited a comment on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-691739386


   I got exactly same crash on my 9.0.x branch (052b535b0066f3e009b4a1b66e060d483d25c680 + my changes) under super heavy load by performance test.
   
   ```
   (gdb) bt
   #0  0x00007fd0b41731d9 in waitpid () from /lib64/libpthread.so.0
   #1  0x000055d19ba56744 in crash_logger_invoke(int, siginfo_t*, void*) (signo=6, info=0x7fd0ab50ac70, ctx=0x7fd0ab50ab40)
       at traffic_server/Crash.cc:168
   #2  <signal handler called>
   #3  0x00007fd0b34c1387 in raise () from /lib64/libc.so.6
   #4  0x00007fd0b34c2a78 in abort () from /lib64/libc.so.6
   #5  0x00007fd0b6014fe4 in ink_abort(char const*, ...) (message_format=<optimized out>) at ink_error.cc:99
   #6  0x00007fd0b6012425 in _ink_assert (expression=0x7fd0b34c1387 <raise+55> "H=", file=0x196a <Address 0x196a out of bounds>, line=6)
       at ink_assert.cc:37
   #7  0x000055d19bc5c73b in ssl_read_from_net (sslvc=0x7fd012ace3c0, lthread=0x7fd0af00d740, ret=<optimized out>)
       at SSLNetVConnection.cc:275
   #8  SSLNetVConnection::net_read_io(NetHandler*, EThread*) (this=0x7fd012ace3c0, nh=0x7fd0af0117d0, lthread=0x7fd0af00d740)
       at SSLNetVConnection.cc:666
   #9  0x000055d19bc7a163 in NetHandler::process_ready_list() (this=0x7fd0af0117d0) at UnixNet.cc:412
   #10 0x000055d19bc7a972 in NetHandler::waitForActivity(long) (this=<optimized out>, timeout=<optimized out>) at UnixNet.cc:547
   #11 0x000055d19bc7aa4d in non-virtual thunk to NetHandler::waitForActivity(long) ()
   #12 0x000055d19bcaf18f in EThread::execute_regular() (this=0x7fd0af00d740) at UnixEThread.cc:266
   #13 0x000055d19bcaf39a in EThread::execute() (this=0x7fd0af00d740) at UnixEThread.cc:327
   #14 0x000055d19bcadf8b in spawn_thread_internal(void*) (a=0x7fd0b2850bd0) at Thread.cc:92
   #15 0x00007fd0b416bea5 in start_thread () from /lib64/libpthread.so.0
   #16 0x00007fd0b35898dd in clone () from /lib64/libc.so.6
   ```
   
   The symptom looks same, the `size_index` of MIOBuffer is garbage (`140524527145360`).
   ```
   (gdb) frame 7
   #7  0x000055d19bc5c73b in ssl_read_from_net (sslvc=0x7fd012ace3c0, lthread=0x7fd0af00d740, ret=<optimized out>)
       at SSLNetVConnection.cc:275
   275	    ink_release_assert(current_block != nullptr);
   (gdb) p *buf.mbuf
   $5 = {size_index = 140524527145360, water_mark = 0, _writer = {m_ptr = 0x7fcf0ed17540}, readers = {{accessor = 0x0, mbuf = 0x0,
         block = {m_ptr = 0x0}, start_offset = 0, size_limit = 9223372036854775807}, {accessor = 0x0, mbuf = 0x0, block = {m_ptr = 0x0},
         start_offset = 0, size_limit = 9223372036854775807}, {accessor = 0x0, mbuf = 0x0, block = {m_ptr = 0x0}, start_offset = 0,
         size_limit = 9223372036854775807}, {accessor = 0x0, mbuf = 0x0, block = {m_ptr = 0x0}, start_offset = 0,
         size_limit = 9223372036854775807}, {accessor = 0x0, mbuf = 0x0, block = {m_ptr = 0x0}, start_offset = 0,
         size_limit = 9223372036854775807}}, _location = 0x55d19bccbe25 "memory/IOBuffer/HttpSM.cc:6876"}
   (gdb) p *buf.mbuf->_writer.m_ptr
   $9 = {<RefCountObj> = {<> = {_vptr$ForceVFPTToTop = 0x55d19bf76b10 <vtable for IOBufferBlock+16>}, m_refcount = 1}, _start = 0x0,
     _end = 0x0, _buf_end = 0x7fce6a85d181 "\276̛\321U", _location = 0x55d19bccbe25 "memory/IOBuffer/HttpSM.cc:6876", data = {
       m_ptr = 0x7fce9d88a0f0}, next = {m_ptr = 0x0}}
   ```
   
   @sudheerv the 9.0.x branch has all fixes related this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
masaori335 commented on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-692526735


   NVM, ASan detected heap-buffer-overflow. It's the root cause.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
masaori335 commented on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-691739386


   I got exactly same crash on my 9.0.x branch (052b535b0066f3e009b4a1b66e060d483d25c680 + my changes). 
   
   ```
   (gdb) bt
   #0  0x00007fd0b41731d9 in waitpid () from /lib64/libpthread.so.0
   #1  0x000055d19ba56744 in crash_logger_invoke(int, siginfo_t*, void*) (signo=6, info=0x7fd0ab50ac70, ctx=0x7fd0ab50ab40)
       at traffic_server/Crash.cc:168
   #2  <signal handler called>
   #3  0x00007fd0b34c1387 in raise () from /lib64/libc.so.6
   #4  0x00007fd0b34c2a78 in abort () from /lib64/libc.so.6
   #5  0x00007fd0b6014fe4 in ink_abort(char const*, ...) (message_format=<optimized out>) at ink_error.cc:99
   #6  0x00007fd0b6012425 in _ink_assert (expression=0x7fd0b34c1387 <raise+55> "H=", file=0x196a <Address 0x196a out of bounds>, line=6)
       at ink_assert.cc:37
   #7  0x000055d19bc5c73b in ssl_read_from_net (sslvc=0x7fd012ace3c0, lthread=0x7fd0af00d740, ret=<optimized out>)
       at SSLNetVConnection.cc:275
   #8  SSLNetVConnection::net_read_io(NetHandler*, EThread*) (this=0x7fd012ace3c0, nh=0x7fd0af0117d0, lthread=0x7fd0af00d740)
       at SSLNetVConnection.cc:666
   #9  0x000055d19bc7a163 in NetHandler::process_ready_list() (this=0x7fd0af0117d0) at UnixNet.cc:412
   #10 0x000055d19bc7a972 in NetHandler::waitForActivity(long) (this=<optimized out>, timeout=<optimized out>) at UnixNet.cc:547
   #11 0x000055d19bc7aa4d in non-virtual thunk to NetHandler::waitForActivity(long) ()
   #12 0x000055d19bcaf18f in EThread::execute_regular() (this=0x7fd0af00d740) at UnixEThread.cc:266
   #13 0x000055d19bcaf39a in EThread::execute() (this=0x7fd0af00d740) at UnixEThread.cc:327
   #14 0x000055d19bcadf8b in spawn_thread_internal(void*) (a=0x7fd0b2850bd0) at Thread.cc:92
   #15 0x00007fd0b416bea5 in start_thread () from /lib64/libpthread.so.0
   #16 0x00007fd0b35898dd in clone () from /lib64/libc.so.6
   ```
   
   The symptom looks same, the `size_index` of MIOBuffer is garbage (`140524527145360`).
   ```
   (gdb) frame 7
   #7  0x000055d19bc5c73b in ssl_read_from_net (sslvc=0x7fd012ace3c0, lthread=0x7fd0af00d740, ret=<optimized out>)
       at SSLNetVConnection.cc:275
   275	    ink_release_assert(current_block != nullptr);
   (gdb) p *buf.mbuf
   $5 = {size_index = 140524527145360, water_mark = 0, _writer = {m_ptr = 0x7fcf0ed17540}, readers = {{accessor = 0x0, mbuf = 0x0,
         block = {m_ptr = 0x0}, start_offset = 0, size_limit = 9223372036854775807}, {accessor = 0x0, mbuf = 0x0, block = {m_ptr = 0x0},
         start_offset = 0, size_limit = 9223372036854775807}, {accessor = 0x0, mbuf = 0x0, block = {m_ptr = 0x0}, start_offset = 0,
         size_limit = 9223372036854775807}, {accessor = 0x0, mbuf = 0x0, block = {m_ptr = 0x0}, start_offset = 0,
         size_limit = 9223372036854775807}, {accessor = 0x0, mbuf = 0x0, block = {m_ptr = 0x0}, start_offset = 0,
         size_limit = 9223372036854775807}}, _location = 0x55d19bccbe25 "memory/IOBuffer/HttpSM.cc:6876"}
   (gdb) p *buf.mbuf->_writer.m_ptr
   $9 = {<RefCountObj> = {<> = {_vptr$ForceVFPTToTop = 0x55d19bf76b10 <vtable for IOBufferBlock+16>}, m_refcount = 1}, _start = 0x0,
     _end = 0x0, _buf_end = 0x7fce6a85d181 "\276̛\321U", _location = 0x55d19bccbe25 "memory/IOBuffer/HttpSM.cc:6876", data = {
       m_ptr = 0x7fce9d88a0f0}, next = {m_ptr = 0x0}}
   ```
   
   @sudheerv the 9.0.x branch has all fixes related this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] sudheerv commented on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
sudheerv commented on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-640111965


   @zwoop Added more instrumentation to figure out where the `_size_index` may be turning into garbage and it seems to be caused by `max_iobuf_size` being garbage somehow (it's supposed to be set based on `proxy.config.io.max_buffer_size` during `ink_event_system_init()`, but, not sure if there's some sort of initialization ordering issue or something else, but, when the caller relies on that default max instead of explicitly passing it in (like you pointed), sometimes, this garbage value is messing up the returned allocSize and that results in corrupting the IOBuffer index and from there it's all downhill. Regardless, the strategy you recommended to make sure callers explicitly pass it in and replace the config based value into a #define sounds perfect to address this.
   
   
   ```(gdb) bt
   #0  0x00002b5b8de08207 in raise () from /lib64/libc.so.6
   #1  0x00002b5b8de098f8 in abort () from /lib64/libc.so.6
   #2  0x00002b5b8b4c835b in ink_abort (message_format=message_format@entry=0x2b5b8b5452a7 "%s:%d: failed assertion `%s`") at ink_error.cc:99
   #3  0x00002b5b8b4c5045 in _ink_assert (expression=expression@entry=0x7dd615 "r <= MAX_BUFFER_SIZE_INDEX", file=file@entry=0x7c1958 "/home/svinukon/Traffic/ATS/ats9/ats-core_trunk/ats9/src/iocore/eventsystem/P_IOBuffer.h", line=line@entry=46) at ink_assert.cc:37
   #4  0x0000000000606e3e in buffer_size_to_index (max=<optimized out>, size=<optimized out>) at /home/svinukon/Traffic/ATS/ats9/ats-core_trunk/ats9/src/iocore/eventsystem/P_IOBuffer.h:46
   #5  LogBuffer::LogBuffer (this=0x2b5b95f6b600, owner=<optimized out>, size=200000, buf_align=512, write_align=<optimized out>) at LogBuffer.cc:115
   #6  0x000000000061bc83 in LogObject::_checkout_write (this=0x2b5b8f3e4b80, write_offset=0x2b5b94604988, bytes_needed=18448) at LogObject.cc:399
   #7  0x000000000061f908 in log (text_entry=..., lad=0x2b5b94604a70, this=0x2b5b8f3e4b80) at LogObject.cc:596
   #8  log (text_entry=0x0, lad=0x2b5b94604a70, this=0x2b5b8f3e4b80) at LogObject.cc:517
   #9  LogObjectManager::log (this=0x2b5b8ecb2928, lad=lad@entry=0x2b5b94604a70) at LogObject.cc:1277
   #10 0x00000000005fac36 in Log::access (lad=lad@entry=0x2b5b94604a70) at Log.cc:1157
   #11 0x000000000054f6b7 in HttpSM::kill_this (this=this@entry=0x2b5b9ed767a0) at HttpSM.cc:7071
   #12 0x000000000054faaf in HttpSM::main_handler (this=0x2b5b9ed767a0, event=2301, data=0x2b5b9ed77720) at HttpSM.cc:2742
   #13 0x000000000059d457 in handleEvent (data=0x2b5b9ed77720, event=2301, this=0x2b5b9ed767a0) at /home/svinukon/Traffic/ATS/ats9/ats-core_trunk/ats9/src/iocore/eventsystem/I_Continuation.h:193
   #14 HttpTunnel::main_handler (this=0x2b5b9ed77720, event=<optimized out>, data=<optimized out>) at HttpTunnel.cc:1626
   #15 0x000000000076cd96 in handleEvent (data=0x2b5bf1793b00, event=103, this=0x2b5b9ed77720) at /home/svinukon/Traffic/ATS/ats9/ats-core_trunk/ats9/src/iocore/eventsystem/I_Continuation.h:193
   #16 write_signal_and_update (vc=0x2b5bf17938b0, event=103) at UnixNetVConnection.cc:115
   #17 write_signal_done (event=103, nh=0x2b5b90516d80, vc=0x2b5bf17938b0) at UnixNetVConnection.cc:161
   #18 0x000000000077368d in write_to_net_io (nh=0x2b5b90516d80, vc=0x2b5bf17938b0, thread=0x2b5b90513000) at UnixNetVConnection.cc:502
   #19 0x000000000075de88 in NetHandler::process_ready_list (this=this@entry=0x2b5b90516d80) at UnixNet.cc:429
   #20 0x000000000075e15d in NetHandler::waitForActivity (this=0x2b5b90516d80, timeout=<optimized out>) at UnixNet.cc:547
   #21 0x00000000007bbc0a in EThread::execute_regular (this=this@entry=0x2b5b90513000) at UnixEThread.cc:266
   #22 0x00000000007bbe92 in EThread::execute (this=0x2b5b90513000) at UnixEThread.cc:327
   #23 0x00000000007ba1e9 in spawn_thread_internal (a=0x2b5b8f169c00) at Thread.cc:92
   #24 0x00002b5b8d11edd5 in start_thread () from /lib64/libpthread.so.0
   #25 0x00002b5b8decfead in clone () from /lib64/libc.so.6
   
   (gdb) f 5
   #5  LogBuffer::LogBuffer (this=0x2b5b95f6b600, owner=<optimized out>, size=200000, buf_align=512, write_align=<optimized out>) at LogBuffer.cc:115
   115	LogBuffer.cc: No such file or directory.
   (gdb) p *this
   $1 = {write_link = {next = 0x0}, link = {<SLink<LogBuffer>> = {next = 0x0}, prev = 0x0}, static M_ID = 1013, m_unaligned_buffer = 0x434d433725303234 <Address 0x434d433725303234 out of bounds>, m_buffer = 0x303943372544494d <Address 0x303943372544494d out of bounds>, m_size = 200000, 
     m_buf_align = 512, m_write_align = 8, m_buffer_fast_allocator_size = 875903281, m_expiration_time = 5567354197238888496, m_owner = 0x2b5b8f3e4b80, m_header = 0x3333373431393531, m_id = 628308272, m_state = {ival = 0, s = {offset = 0, num_entries = 0, full = 0, num_writers = 0}}, 
     m_references = 0}
   (gdb) p  alloc_size
   $2 = 200512
   (gdb) p m_buffer_fast_allocator_size
   $3 = 875903281
   (gdb) p max_iobuffer_size
   $4 = 4426017152009854830
   
   ```
   
   ```
   (gdb) bt
   #0  0x00002b1a961a0207 in raise () from /lib64/libc.so.6
   #1  0x00002b1a961a18f8 in abort () from /lib64/libc.so.6
   #2  0x00002b1a9386035b in ink_abort (message_format=message_format@entry=0x2b1a938dd2a7 "%s:%d: failed assertion `%s`") at ink_error.cc:99
   #3  0x00002b1a9385d045 in _ink_assert (expression=expression@entry=0x7dd615 "r <= MAX_BUFFER_SIZE_INDEX", file=file@entry=0x7c1958 "/home/svinukon/Traffic/ATS/ats9/ats-core_trunk/ats9/src/iocore/eventsystem/P_IOBuffer.h", line=line@entry=46) at ink_assert.cc:37
   #4  0x000000000054a786 in buffer_size_to_index (max=<optimized out>, size=<optimized out>) at /home/svinukon/Traffic/ATS/ats9/ats-core_trunk/ats9/src/iocore/eventsystem/P_IOBuffer.h:46
   #5  HttpSM::do_setup_post_tunnel (this=0x2b1b62f6be90, to_vc_type=HTTP_SERVER_VC) at HttpSM.cc:5838
   #6  0x000000000054e1ad in HttpSM::state_send_server_request_header (this=0x2b1b62f6be90, event=103, data=0x2b1ac8036100) at HttpSM.cc:2065
   #7  0x000000000054fa58 in HttpSM::main_handler (this=0x2b1b62f6be90, event=103, data=0x2b1ac8036100) at HttpSM.cc:2729
   #8  0x000000000076cd96 in handleEvent (data=0x2b1ac8036100, event=103, this=0x2b1b62f6be90) at /home/svinukon/Traffic/ATS/ats9/ats-core_trunk/ats9/src/iocore/eventsystem/I_Continuation.h:193
   #9  write_signal_and_update (vc=0x2b1ac8035eb0, event=103) at UnixNetVConnection.cc:115
   #10 write_signal_done (event=103, nh=0x2b1a9a809d80, vc=0x2b1ac8035eb0) at UnixNetVConnection.cc:161
   #11 0x000000000077368d in write_to_net_io (nh=0x2b1a9a809d80, vc=0x2b1ac8035eb0, thread=0x2b1a9a806000) at UnixNetVConnection.cc:502
   #12 0x000000000075de88 in NetHandler::process_ready_list (this=this@entry=0x2b1a9a809d80) at UnixNet.cc:429
   #13 0x000000000075e15d in NetHandler::waitForActivity (this=0x2b1a9a809d80, timeout=<optimized out>) at UnixNet.cc:547
   #14 0x00000000007bbc0a in EThread::execute_regular (this=this@entry=0x2b1a9a806000) at UnixEThread.cc:266
   #15 0x00000000007bbe92 in EThread::execute (this=0x2b1a9a806000) at UnixEThread.cc:327
   #16 0x00000000007ba1e9 in spawn_thread_internal (a=0x2b1a97563d80) at Thread.cc:92
   #17 0x00002b1a954b6dd5 in start_thread () from /lib64/libpthread.so.0
   #18 0x00002b1a96267ead in clone () from /lib64/libc.so.6
   
   (gdb) f 4
   #4  0x000000000054a786 in buffer_size_to_index (max=<optimized out>, size=<optimized out>) at /home/svinukon/Traffic/ATS/ats9/ats-core_trunk/ats9/src/iocore/eventsystem/P_IOBuffer.h:46
   46	/home/svinukon/Traffic/ATS/ats9/ats-core_trunk/ats9/src/iocore/eventsystem/P_IOBuffer.h: No such file or directory.
   (gdb) p r
   $3 = <optimized out>
   (gdb) p max_iobuffer_size
   $4 = 4426017152009854830
   
   #5  HttpSM::do_setup_post_tunnel (this=0x2b1b62f6be90, to_vc_type=HTTP_SERVER_VC) at HttpSM.cc:5838
   5838	HttpSM.cc: No such file or directory.
   (gdb) p this->t_state.hdr_info.request_content_length
   $1 = 1105
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 edited a comment on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
masaori335 edited a comment on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-640297605


   Nice catch! Does this mean bad error handing of calling `buffer_size_to_index()` in this case?
   I don't object bumping the default value or removing it, but even if we bump the defult value or explicitly set the `max`, the `size` could be larger than it and the function returns an error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] sudheerv closed issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
sudheerv closed issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 closed issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
masaori335 closed issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] sudheerv commented on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
sudheerv commented on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-647039331


   Ran ASAN in prod (couldn't run it for longer than an hour before maxing out CPU) and our staging/integration environment (over a day, slow burn as the qps is lower) and found a few heap corruption (in our internal plugins and libraries) and heap overflow issues (#6916 ).
   After fixing those, the occurrence of this crash dropped significantly (from once every 3-4 hours to 1-2/day).
   
   Will open a separate issue for the findings on the last remaining crashes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] sudheerv edited a comment on issue #6849: Crash in ssl_read_from_net due to assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
sudheerv edited a comment on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-638493946


   It looks like the root of the problem is that `IOBufferData::_size_index` seems garbage (3833167203381096996). This typically comes down via the call to `new_MIOBuffer` but, reviewing the code for `alloc_index` passed with `new_MIOBuffer`, couldn't find a smoking gun yet where this could be garbage.
   
   Specifically, the line number (location) from the corrupted IOBufferData shows HttpSM.cc:5840 and the logic in that code block that sets alloc_index didn't seem to have any issues.
   
   ```    int64_t alloc_index;
       // content length is undefined, use default buffer size
       if (t_state.hdr_info.request_content_length == HTTP_UNDEFINED_CL) {
         alloc_index = static_cast<int>(t_state.txn_conf->default_buffer_size_index);
         if (alloc_index < MIN_CONFIG_BUFFER_SIZE_INDEX || alloc_index > MAX_BUFFER_SIZE_INDEX) {
           alloc_index = DEFAULT_REQUEST_BUFFER_SIZE_INDEX;
         }    
       } else {
         alloc_index = buffer_size_to_index(t_state.hdr_info.request_content_length);
       }    
       MIOBuffer *post_buffer    = new_MIOBuffer(alloc_index);
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
masaori335 commented on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-640297605


   Nice catch! Does this mean bad error handing of calling `buffer_size_to_index()` in this case?
   I don't object bumping the default value, but even if we bump the defult value or explicitly set the `max`, the `size` could be larger than it and the function returns an error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] sudheerv commented on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
sudheerv commented on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-638585996


   This may be due to a similar root cause as 
   https://github.com/apache/trafficserver/issues/6850
   
   I'm wondering if we should bump up the default value of `proxy.config.io.max_buffer_size` from 32K to higher - probably 1M (index = 13) or 2M even (index=14)?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] sudheerv commented on issue #6849: Crash in ssl_read_from_net due to assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
sudheerv commented on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-638493946


   It looks like the root of the problem is that `IOBufferData::_size_index` seems garbage (3833167203381096996). This typically comes down via the call to `new_MIOBuffer` but, reviewing the code seems like the `alloc_index` passed with `new_MIOBuffer`, couldn't find a smoking gun yet where this could be garbage.
   
   Specifically, the line number (location) from the corrupted IOBufferData shows HttpSM.cc:5840 and the logic in that code block that sets alloc_index didn't seem to have any issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] masaori335 commented on issue #6849: ssl_read_from_net assert failure for null buffer block

Posted by GitBox <gi...@apache.org>.
masaori335 commented on issue #6849:
URL: https://github.com/apache/trafficserver/issues/6849#issuecomment-640317629


   Could you add an assert to make sure the `max_iobuf_size` is in bewteen 0 and MAX_BUFFER_SIZE_INDEX (14) at the startup?
   ```
      max_iobuffer_size = buffer_size_to_index(config_max_iobuffer_size, DEFAULT_BUFFER_SIZES - 1);
   +  ink_release_assert(0 <= max_iobuffer_size && max_iobuffer_size <= MAX_BUFFER_SIZE_INDEX);
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org