You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Sudheer Vinukonda (JIRA)" <ji...@apache.org> on 2014/12/28 17:21:13 UTC

[jira] [Comment Edited] (TS-2780) Core dump in SpdyRequest::clear() in production testing of SPDY

    [ https://issues.apache.org/jira/browse/TS-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055129#comment-14055129 ] 

Sudheer Vinukonda edited comment on TS-2780 at 12/28/14 4:20 PM:
-----------------------------------------------------------------

I am finally able to reproduce this error message and understand the issue. Basically, this happens, due to a race condition, when the client sends a {{DATA}} frame while ATS sends an error in {{SYN_REPLY}} (e.g. a POST transaction with client trying to send a large data file, but, ATS denies POST transactions from the client using ip_allow ACL). 

{code}
E.g. % spdycat -3 -v https://abc.com -d /tmp/1 > /tmp/2 (where /tmp/1 is a large data file with about 5K lines). 
{code}

The issue happens due to the code in {{spdy_on_data_chunk_recv_callback()}}, {{spdy_on_data_recv_callback()}} etc. Basically, when the error happens, the request is deleted and erased from {{SpdyClientSession->req_map}}. However, the data frame may still be received after the req object is deleted. The below code correctly ignores the data frames, since, the request has been deleted already - but, the statement {{SpdyRequest *req = sm->req_map[stream_id];}} results in adding an element with key {{stream_id}} to the {{req_map}} with value {{NULL}}. A correct way to check if the req has been erased from the map is to check if the iterator pointing to that element is valid. Will test and upload a patch shortly ..

{code}
void
spdy_on_data_chunk_recv_callback(spdylay_session * /*session*/, uint8_t /*flags*/,
                                 int32_t stream_id, const uint8_t *data,
                                 size_t len, void *user_data)
{
  SpdyClientSession *sm = (SpdyClientSession *)user_data;
  SpdyRequest *req = sm->req_map[stream_id];
  //
  // SpdyRequest has been deleted on error, drop this data;
  //
  if (!req)
    return; 
{code}




was (Author: sudheerv):
I am finally able to reproduce this error message and understand the issue. Basically, this happens, due to a race condition, when the client sends a DATA frame while ATS sends an error in SYN_REPLY (e.g. a POST transaction with client trying to send a large data file, but, ATS denies POST transactions from the client using ip_allow ACL). 

E.g. % spdycat -3 -v https://abc.com -d /tmp/1 > /tmp/2 (where /tmp/1 is a large data file with about 5K lines). 

The issue happens due to the code in spdy_on_data_chunk_recv_callback(), spdy_on_data_recv_callback() etc. Basically, when the error happens, the request is deleted and erased from client_session->req_map. However, the data frame may still be received after the req object is deleted. The below code correctly ignores the data frames, since, the request has been deleted already - but, the statement "SpdyRequest *req = sm->req_map[stream_id];" results in adding an element with key stream_id to the req_map with value NULL. A correct way to check if the req has been erased from the map is to check if the iterator pointing to that element is valid. Will test and upload a patch shortly ..

{code}
void
spdy_on_data_chunk_recv_callback(spdylay_session * /*session*/, uint8_t /*flags*/,
                                 int32_t stream_id, const uint8_t *data,
                                 size_t len, void *user_data)
{
  SpdyClientSession *sm = (SpdyClientSession *)user_data;
  SpdyRequest *req = sm->req_map[stream_id];
  //
  // SpdyRequest has been deleted on error, drop this data;
  //
  if (!req)
    return; 
{code}



> Core dump in SpdyRequest::clear() in production testing of SPDY
> ---------------------------------------------------------------
>
>                 Key: TS-2780
>                 URL: https://issues.apache.org/jira/browse/TS-2780
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: SPDY
>            Reporter: Sudheer Vinukonda
>            Assignee: Bryan Call
>              Labels: spdy, yahoo
>             Fix For: 5.1.0
>
>         Attachments: TS-2780.diff, ts2780.diff
>
>
> Noticed the below core dump in SpdyRequest::clear() during production testing on our proxy platform.
> {code}
> NOTE: Traffic Server received Sig 11: Segmentation fault
> /home/y/bin/traffic_server - STACK TRACE: 
> /lib64/libpthread.so.0(+0x3420a0f500)[0x2b47cff7d500]
> /home/y/bin/traffic_server(_ZN11SpdyRequest5clearEv+0x14)[0x5d06d4]
> /home/y/bin/traffic_server(_ZN6SpdySM5clearEv+0x34)[0x5d0b84]
> /home/y/bin/traffic_server[0x5d13d1]
> /home/y/bin/traffic_server(_ZN18UnixNetVConnection14readSignalDoneEiP10NetHandler+0x3d)[0x6efbcd]
> /home/y/bin/traffic_server(_ZN17SSLNetVConnection11net_read_ioEP10NetHandlerP7EThread+0xc76)[0x6df3e6]
> /home/y/bin/traffic_server(_ZN10NetHandler12mainNetEventEiP5Event+0x1f2)[0x6e6b12]
> /home/y/bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x8f)[0x714aaf]
> /home/y/bin/traffic_server(_ZN7EThread7executeEv+0x493)[0x715453]
> /home/y/bin/traffic_server[0x713e5a]
> /lib64/libpthread.so.0(+0x3420a07851)[0x2b47cff75851]
> /lib64/libc.so.6(clone+0x6d)[0x34202e894d]
> {code}
> gdb output shows that req may be a null pointer in SpdySM::clear()
> {code}
> (gdb) bt
> #0  SpdyRequest::clear (this=0x0) at SpdySM.cc:44
> #1  0x00000000005d0b84 in SpdySM::clear (this=0x2aefdc4b46a0) at SpdySM.cc:116
> #2  0x00000000005d13d1 in spdy_default_handler (contp=<value optimized out>, event=TS_EVENT_VCONN_INACTIVITY_TIMEOUT, edata=<value optimized out>) at SpdySM.cc:258
> #3  0x00000000006eeabb in handleEvent (event=<value optimized out>, vc=0x2aefcc141ca0) at ../../iocore/eventsystem/I_Continuation.h:146
> #4  read_signal_and_update (event=<value optimized out>, vc=0x2aefcc141ca0) at UnixNetVConnection.cc:216
> #5  0x00000000006f3cd4 in UnixNetVConnection::mainEvent (this=0x2aefcc141ca0, event=<value optimized out>, e=<value optimized out>) at UnixNetVConnection.cc:1152
> #6  0x00000000006e7d0e in handleEvent (this=0x1634af0, event=<value optimized out>, e=0x136e850) at ../../iocore/eventsystem/I_Continuation.h:146
> #7  InactivityCop::check_inactivity (this=0x1634af0, event=<value optimized out>, e=0x136e850) at UnixNet.cc:67
> #8  0x0000000000714aaf in handleEvent (this=0x2aef48f91010, e=0x136e850, calling_code=2) at I_Continuation.h:146
> #9  EThread::process_event (this=0x2aef48f91010, e=0x136e850, calling_code=2) at UnixEThread.cc:145
> #10 0x0000000000715623 in EThread::execute (this=0x2aef48f91010) at UnixEThread.cc:224
> #11 0x0000000000713e5a in spawn_thread_internal (a=0x163d000) at Thread.cc:88
> #12 0x00002aed0a8a1851 in start_thread () from /lib64/libpthread.so.0
> #13 0x00000034202e894d in clone () from /lib64/libc.so.6
> (gdb) print this
> $1 = (SpdyRequest * const) 0x0
> (gdb) up
> #1  0x00000000005d0b84 in SpdySM::clear (this=0x2aefdc4b46a0) at SpdySM.cc:116
> 116	in SpdySM.cc
> (gdb) print req
> $2 = (SpdyRequest *) 0x0
> (gdb) print iter
> $3 = {first = 25, second = }
> (gdb) print req_map
> $4 = std::map with 3 elements = {
>   [1] = 0x2aefd81e5ae0,
>   [25] = 0x0,
>   [27] = 0x0
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)