You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Peter Chou (JIRA)" <ji...@apache.org> on 2016/05/24 00:23:13 UTC

[jira] [Created] (TS-4475) Crash in Log-Collation client after using inactivity-cop.

Peter Chou created TS-4475:
------------------------------

             Summary: Crash in Log-Collation client after using inactivity-cop.
                 Key: TS-4475
                 URL: https://issues.apache.org/jira/browse/TS-4475
             Project: Traffic Server
          Issue Type: Bug
          Components: Logging
            Reporter: Peter Chou


Background: We recently tried making use of inactivity-cop by setting it to 300s instead of the default one-day setting. This was to address an issue where, under heavy load, ATS would become un-responsive to client requests, and the condition would persist after traffic was stopped with the active queue saying 0 connections but 'netstat -na' showing a bunch of established connections (up to the throttle limit approximately).

Inactivity cop seemed to help ATS handle this situation, but we have since experienced a couple of core dumps over the last four day period. It seems occasionally the Log Collation Client State Machine will have event value 105 or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() it tries to call the continuation handler which down the line does not know about this event thus causing core dump !"unexpcted state" [sic].

Here is the back-trace --

(gdb) bt
#0  0x00002b67cd5405f7 in raise () from /lib64/libc.so.6
#1  0x00002b67cd541e28 in abort () from /lib64/libc.so.6
#2  0x00002b67cb032921 in ink_die_die_die () at ink_error.cc:43
#3  0x00002b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65
#4  0x00002b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: failed assert `%s`") at ink_error.cc:73
#5  0x00002b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted state\"", file=0x7fb35b "LogCollationClientSM.cc",
    line=445) at ink_assert.cc:37
#6  0x000000000069c86b in LogCollationClientSM::client_idle (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445
#7  0x000000000069b427 in LogCollationClientSM::client_handler (this=0x2b681400bb00, event=105, data=0x2b680c017020)
    at LogCollationClientSM.cc:119
#8  0x0000000000502cc6 in Continuation::handleEvent (this=0x2b681400bb00, event=105, data=0x2b680c017020)
    at ../iocore/eventsystem/I_Continuation.h:153
#9  0x0000000000783d40 in read_signal_and_update (event=105, vc=0x2b680c016f00) at UnixNetVConnection.cc:150
#10 0x0000000000787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, event=1, e=0x127ad60) at UnixNetVConnection.cc:1188
#11 0x0000000000502cc6 in Continuation::handleEvent (this=0x2b680c016f00, event=1, data=0x127ad60)
    at ../iocore/eventsystem/I_Continuation.h:153
#12 0x000000000077d943 in InactivityCop::check_inactivity (this=0x1209a00, event=2, e=0x127ad60) at UnixNet.cc:102
#13 0x0000000000502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, data=0x127ad60)
    at ../iocore/eventsystem/I_Continuation.h:153
#14 0x00000000007a5df6 in EThread::process_event (this=0x2b67cf7bb010, e=0x127ad60, calling_code=2) at UnixEThread.cc:128
#15 0x00000000007a61f5 in EThread::execute (this=0x2b67cf7bb010) at UnixEThread.cc:207
#16 0x0000000000534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918

I believe it takes a wrong turn here --

#9  0x0000000000783d40 in read_signal_and_update (event=105, vc=0x2b680c016f00) at UnixNetVConnection.cc:150
150         vc->read.vio._cont->handleEvent(event, &vc->read.vio);
(gdb) list
145     static inline int
146     read_signal_and_update(int event, UnixNetVConnection *vc)
147     {
148       vc->recursion++;
149       if (vc->read.vio._cont) {
150         vc->read.vio._cont->handleEvent(event, &vc->read.vio);
151       } else {
152         switch (event) {
153         case VC_EVENT_EOS:
154         case VC_EVENT_ERROR:
(gdb) list
155         case VC_EVENT_ACTIVE_TIMEOUT:
156         case VC_EVENT_INACTIVITY_TIMEOUT:
157           Debug("inactivity_cop", "event %d: null read.vio cont, closing vc %p", event, vc);
158           vc->closed = 1;
159           break;
160         default:
161           Error("Unexpected event %d for vc %p", event, vc);
162           ink_release_assert(0);
163           break;
164         }

Note: I understand that there were several issues related to TS-3196 concerning inactivity_cop and this section of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)