You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Susan Hinrichs (JIRA)" <ji...@apache.org> on 2014/11/03 23:57:38 UTC

[jira] [Commented] (TS-3105) Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 5.1 and beyond

    [ https://issues.apache.org/jira/browse/TS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195298#comment-14195298 ] 

Susan Hinrichs commented on TS-3105:
------------------------------------

Last Friday while working on the patches for 5.1, ran into the following issues.  

VC_EVENT_EOS was being delivered to consumer_handler in some cases during a post workload.  It looks like there were two cases for this.

1. The consumer's associated VC is for the HttpServerSession.  The post response is very short (one packet) .  It is delivered before the second server response tunnel is set up.  Since there is no producer matching the VC, the event is instead delivered to the consumer for the first tunnel .  Fixed this by changing the do_io_read in HttpSM::attach_server_session to read no bytes.  This is sufficient to redirect error and timeout events to the new VC handler, but it won't start reading anything until the server response tunnel is in place and a second do_io_read is issued in  HttpSM::setup_server_read_response_header.  With this change the events from the second tunnel will be delivered to the second tunnel's producer.

While poking around in this logic, noticed that the call to do_io_read in  HttpSM::attach_client_session was passing a length of 0, but a non-null buffer.  Changed the third argument to NULL.

2. In the second case, a RESET is performed and is delivered as a VC_EVENT_EOS  I was exercising this by sending a Reset on the client side.  This means that the EOS delivered to the consumer_handler should indeed be treated as an error case.



> Combination of fixes for TS-3084 and TS-3073 causing asserts and segfaults on 5.1 and beyond
> --------------------------------------------------------------------------------------------
>
>                 Key: TS-3105
>                 URL: https://issues.apache.org/jira/browse/TS-3105
>             Project: Traffic Server
>          Issue Type: Bug
>            Reporter: Susan Hinrichs
>            Assignee: Susan Hinrichs
>             Fix For: 5.2.0
>
>         Attachments: ts-3073-and-3084-and-3105-against-510.patch, ts-3105-master-6.patch
>
>
> These two patches were run in a production environment on top of 5.0.1 without problem for several weeks.  Now running with these patches on top of 5.1 causes either an assert or a segfault.  Another person has reported the same segfault when running master in a production environment.
> In the assert, the handler_state of the producers is 0 (UNKNOWN) rather than a terminal state which is expected.  I'm assuming either we are being directed into the terminal state from a connection that terminates too quickly.  Or an event has hung around for too long and is being executed against the state machine after it has been recycled.
> The event is HTTP_TUNNEL_EVENT_DONE
> The assert stack trace is
> FATAL: HttpSM.cc:2632: failed assert `0`
> /z/bin/traffic_server - STACK TRACE:
> /z/lib/libtsutil.so.5(+0x25197)[0x2b8bd08dc197]
> /z/lib/libtsutil.so.5(+0x23def)[0x2b8bd08dadef]
> /z/bin/traffic_server(HttpSM::tunnel_handler_post_or_put(HttpTunnelProducer*)+0xcd)[0x5982ad]
> /z/bin/traffic_server(HttpSM::tunnel_handler_post(int, void*)+0x86)[0x5a32d6]
> /z/bin/traffic_server(HttpSM::main_handler(int, void*)+0xd8)[0x5a1e18]
> /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0xee)[0x5dd6ae]
> /z/bin/traffic_server(write_to_net_io(NetHandler*, UnixNetVConnection*, EThread*)+0x136e)[0x721d1e]
> /z/bin/traffic_server(NetHandler::mainNetEvent(int, Event*)+0x28c)[0x7162fc]
> /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
> /z/bin/traffic_server(EThread::execute()+0x4fc)[0x7458ac]
> /z/bin/traffic_server[0x7440ca]
> /lib64/libpthread.so.0(+0x7034)[0x2b8bd1ee4034]
> /lib64/libc.so.6(clone+0x6d)[0x2b8bd2c2875d]
> The segfault stack trace is 
> /z/bin/traffic_server - STACK TRACE: 
> /lib64/libpthread.so.0(+0xf280)[0x2abccd0d8280]
> /z/bin/traffic_server(HttpSM::tunnel_handler_ua(int, HttpTunnelConsumer*)+0x122)[0x591462]
> /z/bin/traffic_server(HttpTunnel::consumer_handler(int, HttpTunnelConsumer*)+0x9e)[0x5dd15e]
> /z/bin/traffic_server(HttpTunnel::main_handler(int, void*)+0x117)[0x5dd6d7]
> /z/bin/traffic_server(UnixNetVConnection::mainEvent(int, Event*)+0x3f0)[0x725190]
> /z/bin/traffic_server(InactivityCop::check_inactivity(int, Event*)+0x275)[0x716b75]
> /z/bin/traffic_server(EThread::process_event(Event*, int)+0x91)[0x744df1]
> /z/bin/traffic_server(EThread::execute()+0x2fb)[0x7456ab]
> /z/bin/traffic_server[0x7440ca]
> /lib64/libpthread.so.0(+0x7034)[0x2abccd0d0034]
> /lib64/libc.so.6(clone+0x6d)[0x2abccde1475d]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)