You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "kang li (JIRA)" <ji...@apache.org> on 2014/12/18 07:36:13 UTC
[jira] [Comment Edited] (TS-3235) PluginVC crashed with unrecognized event

    [ https://issues.apache.org/jira/browse/TS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251285#comment-14251285 ] 

kang li edited comment on TS-3235 at 12/18/14 6:35 AM:
-------------------------------------------------------

This was turn out to be a race condition issue. That seems two thread are in race condition to create a {{sm_lock_retry_event}}. This event may be created by {{PluginVC::do_io_read}}, {{PluginVC::do_io_write}}, {{PluginVC::reenable}} and {{PluginVC::do_io_close}} through {{PluginVC::setup_event_cb}}.
{code}
void
PluginVC::setup_event_cb(ink_hrtime in, Event ** e_ptr)
{

  ink_assert(magic == PLUGIN_VC_MAGIC_ALIVE);

  if (*e_ptr == NULL) {

    // We locked the pointer so we can now allocate an event
    //   to call us back
    if (in == 0) { 
      if(this_ethread()->tt == REGULAR) {
        *e_ptr = this_ethread()->schedule_imm_local(this);
      }     
{code}

The core dump seems caused by this scenario:
thread 1 and thread 2 check *e_ptr as NULL. Then they all enter into to create a {{sm_lock_retry_event}}. This thread2 create a new event2 override the event1 created by thread1. So PluginVC's {{sm_lock_retry_event}} pointed to event2. But when event1 was rescheduled to PluginVC, it can't recognize the event as a valid event. Then caused the core dump.

We had make a simple fix to use a mutex to double check the {{e_ptr}} in {{PluginVC::setup_event_cb(ink_hrtime in, Event ** e_ptr)}}. The core dump seems not happened again.

I are not clear if this a proper fix. As we may not work in  the best pattern as a intercept plugin. Our plugin call InterceptPlugin::produce in a separated thread  to write back the response. The core dump may be somewhat related to this work pattern.  


was (Author: kang li):
This was turn out to be a race condition issue. That seems two thread are in race condition to create a {{sm_lock_retry_event}}. This event may be created by {{PluginVC::do_io_read}}, {{PluginVC::do_io_write}}, {{PluginVC::reenable}} and {{PluginVC::do_io_close}} through {{PluginVC::setup_event_cb}}.
{code}
void
PluginVC::setup_event_cb(ink_hrtime in, Event ** e_ptr)
{

  ink_assert(magic == PLUGIN_VC_MAGIC_ALIVE);

  if (*e_ptr == NULL) {

    // We locked the pointer so we can now allocate an event
    //   to call us back
    if (in == 0) { 
      if(this_ethread()->tt == REGULAR) {
        *e_ptr = this_ethread()->schedule_imm_local(this);
      }     
{code}

The core dump seems caused by this scenario:
thread 1 and thread 2 check *e_ptr as NULL. Then they all enter into to create a {{sm_lock_retry_event}}. This thread2 create a new event2 override the event1 created by thread1. So PluginVC's {{sm_lock_retry_event}} pointed to event2. But when event1 was rescheduled to PluginVC, it can't recognize the event as a valid event. Then caused the core dump.

We had make a simple fix to use a mutex to double check the {{e_ptr}} in {{PluginVC::setup_event_cb(ink_hrtime in, Event ** e_ptr)}}. The core dump seems not happened again.

I are not sure if we worked in a best pattern as a intercept plugin. As our plugin call InterceptPlugin::produce in a separated thread  to write back the response. The core dump may be somewhat related to this work pattern.  

> PluginVC crashed with unrecognized event
> ----------------------------------------
>
>                 Key: TS-3235
>                 URL: https://issues.apache.org/jira/browse/TS-3235
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: CPP API, HTTP, Plugins
>            Reporter: kang li
>            Assignee: Brian Geffon
>             Fix For: 5.3.0
>
>
> We are using atscppapi to create Intercept plugin.
>  
> From the coredump , that seems Continuation of the InterceptPlugin was already been destroyed. 
> {code}
> #0  0x000000375ac32925 in raise () from /lib64/libc.so.6
> #1  0x000000375ac34105 in abort () from /lib64/libc.so.6
> #2  0x00002b21eeae3458 in ink_die_die_die (retval=1) at ink_error.cc:43
> #3  0x00002b21eeae3525 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, 
>     message_format=0x2b21eeaf08d8 "%s:%d: failed assert `%s`", ap=0x2b21f4913ad0) at ink_error.cc:65
> #4  0x00002b21eeae35ee in ink_fatal (return_code=1, message_format=0x2b21eeaf08d8 "%s:%d: failed assert `%s`") at ink_error.cc:73
> #5  0x00002b21eeae2160 in _ink_assert (expression=0x76ddb8 "call_event == core_lock_retry_event", file=0x76dd04 "PluginVC.cc", line=203)
>     at ink_assert.cc:37
> #6  0x0000000000530217 in PluginVC::main_handler (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at PluginVC.cc:203
> #7  0x00000000004f5854 in Continuation::handleEvent (this=0x2b24ef007cb8, event=1, data=0xe0f5b80) at ../iocore/eventsystem/I_Continuation.h:146
> #8  0x0000000000755d26 in EThread::process_event (this=0x309b250, e=0xe0f5b80, calling_code=1) at UnixEThread.cc:145
> #9  0x000000000075610a in EThread::execute (this=0x309b250) at UnixEThread.cc:239
> #10 0x0000000000755284 in spawn_thread_internal (a=0x2849330) at Thread.cc:88
> #11 0x00002b21ef05f9d1 in start_thread () from /lib64/libpthread.so.0
> #12 0x000000375ace8b7d in clone () from /lib64/libc.so.6
> (gdb) p sm_lock_retry_event
> $13 = (Event *) 0x2b2496146e90
> (gdb) p core_lock_retry_event
> $14 = (Event *) 0x0
> (gdb) p active_event
> $15 = (Event *) 0x0
> (gdb) p inactive_event
> $16 = (Event *) 0x0
> (gdb) p *(INKContInternal*)this->core_obj->connect_to
> Cannot access memory at address 0x2b269cd46c10
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)