You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "John Plevyak (Commented) (JIRA)" <ji...@apache.org> on 2011/12/06 00:16:40 UTC

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163156#comment-13163156 ] 

John Plevyak commented on TS-937:
---------------------------------

This patch shouldn't have any effect in correct code.  Checking ->canceled outside of the mutex is at best an optimization.  In any case, before the continuation is used the canceled field has to be checked under the lock.  Most likely the core problem is that someone is canceling the event without holding the lock (a situation which subsumes the case of being in a callback as weijin suggested).  If this happens, there is a potential race where process_event is in progress, someone cancels the event (without the lock) and deletes the Continuation resulting in a crash.

However, this patch does reduce the change of a race condition in the debug build.   The core problem is that the macro MUTEX_LOCK_FOR is accessing HANDLER_NAME
which dereferences the continuation.  This is bad.  The macro needs to be fixed to delay deferencing the Continuation until the lock is taken.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira