You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Brian Geffon (JIRA)" <ji...@apache.org> on 2011/08/29 23:39:37 UTC

[jira] [Created] (TS-937) EThread::execute still processing cancelled thread

EThread::execute still processing cancelled thread
--------------------------------------------------

                 Key: TS-937
                 URL: https://issues.apache.org/jira/browse/TS-937
             Project: Traffic Server
          Issue Type: Bug
          Components: Core
    Affects Versions: 2.1.9, 3.0.1
         Environment: RHEL6
            Reporter: Brian Geffon
         Attachments: UnixEThread.patch

The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 

Brian



Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff64fa700 (LWP 28518)]
0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
(gdb) bt
#0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
#1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
#2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
#3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
#4  0x000000361f8e577d in clone () from /lib64/libc.so.6
(gdb) bt full
#0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
        lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
#1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
        done_one = false
        e = 0x1db45c0
        NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
        next_time = 1314647904419648000
#2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
        p = 0xfb7e80
#3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x000000361f8e577d in clone () from /lib64/libc.so.6
No symbol table info available.
(gdb) f 0
#0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
(gdb) p *e
$2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
  immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "weijin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096661#comment-13096661 ] 

weijin commented on TS-937:
---------------------------

@Brian Geffon, Yes, EventQueueExternal used for events coming from other threads. In my opinion, the canceled aciton should be performed only when the event`s continuaiton called back. The cancel action is not thread safe, see I_Action.h

> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.1
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164858#comment-13164858 ] 

Brian Geffon commented on TS-937:
---------------------------------

FYI, apparently this patch was included in 3.0.2 but there was no reference in the change log that could get confusing down the road.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166662#comment-13166662 ] 

John Plevyak commented on TS-937:
---------------------------------

Let's nuke TS_HAS_PURIFY.  If we want to make a valgrind target (e.g. something which would enable normal malloc) than that is an idea, but this macro is confusing.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon updated TS-937:
----------------------------

    Comment: was deleted

(was: So, I've began to look into this bug again. To try to determine where the action is being canceled I modified Action to add a const char * volatile cancelled_by; and then simply replaced any instance of ->cancel() to pass the name of the method doing the cancelling:
)
    
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-937) EThread::execute still processing cancelled thread

Posted by "Brian Geffon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon updated TS-937:
----------------------------

    Attachment: UnixEThread.patch

> EThread::execute still processing cancelled thread
> --------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095466#comment-13095466 ] 

Brian Geffon commented on TS-937:
---------------------------------

@weijin, is EventQueueExternal the queue for events coming from other threads? I've been using the patch for a few days now and it has completely resolved the segfaults. Do you have ideas about what else I would need to check for, because I can create a new patch.

Brian

> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.1
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon updated TS-937:
----------------------------

    Fix Version/s:     (was: 3.1.2)
                   3.1.3
    
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon updated TS-937:
----------------------------

    Attachment: ts937.full.patch

Attaching new patch (ts937.full.patch) which removes TS_HAS_PURIFY and HANDLER_NAME macros entirely, i have tested against trunk and passes regression in both debug and non-debug builds.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch, ts937.full.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-937:
-----------------------------

    Fix Version/s: 3.1.1

> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.1
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Igor Galić (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Igor Galić updated TS-937:
--------------------------

    Description: 
The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 

Brian


{noformat}
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff64fa700 (LWP 28518)]
0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
(gdb) bt
#0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
#1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
#2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
#3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
#4  0x000000361f8e577d in clone () from /lib64/libc.so.6
(gdb) bt full
#0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
        lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
#1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
        done_one = false
        e = 0x1db45c0
        NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
        next_time = 1314647904419648000
#2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
        p = 0xfb7e80
#3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x000000361f8e577d in clone () from /lib64/libc.so.6
No symbol table info available.
(gdb) f 0
#0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
(gdb) p *e
$2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
  immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}
{noformat}

  was:
The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 

Brian



Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff64fa700 (LWP 28518)]
0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
(gdb) bt
#0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
#1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
#2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
#3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
#4  0x000000361f8e577d in clone () from /lib64/libc.so.6
(gdb) bt full
#0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
        lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
#1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
        done_one = false
        e = 0x1db45c0
        NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
        next_time = 1314647904419648000
#2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
        p = 0xfb7e80
#3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x000000361f8e577d in clone () from /lib64/libc.so.6
No symbol table info available.
(gdb) f 0
#0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
(gdb) p *e
$2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
  immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}



    
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch, ts937.full.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> {noformat}
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Leif Hedstrom (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-937:
-----------------------------

    Fix Version/s:     (was: 3.1.1)
                   3.1.2

Moving all unassigned bugs out to 3.1.2
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon updated TS-937:
----------------------------

    Comment: was deleted

(was: Ignore notifications about a comment talking about inclusion in 3.0.2, I was mistaken.
)
    
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon updated TS-937:
----------------------------

    Comment: was deleted

(was: FYI, apparently this patch was included in 3.0.2 but there was no reference in the change log that could get confusing down the road.)
    
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198532#comment-13198532 ] 

Brian Geffon commented on TS-937:
---------------------------------

So it appears that the event being cancelled is an event callback related to a MUTEX being held. See PluginVC.cc:489
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "weijin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096662#comment-13096662 ] 

weijin commented on TS-937:
---------------------------

You need to find where this event be canceled, If it`s not performed when continuation called back, that`s the bug. 

> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.1
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon updated TS-937:
----------------------------

    Comment: was deleted

(was: So it turns out that this bug is fixed in TS-1074, I've verified this and am closing this bug.)
    
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198522#comment-13198522 ] 

Brian Geffon commented on TS-937:
---------------------------------

So, I've began to look into this bug again. To try to determine where the action is being canceled I modified Action to add a const char * volatile cancelled_by; and then simply replaced any instance of ->cancel() to pass the name of the method doing the cancelling:

                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163156#comment-13163156 ] 

John Plevyak commented on TS-937:
---------------------------------

This patch shouldn't have any effect in correct code.  Checking ->canceled outside of the mutex is at best an optimization.  In any case, before the continuation is used the canceled field has to be checked under the lock.  Most likely the core problem is that someone is canceling the event without holding the lock (a situation which subsumes the case of being in a callback as weijin suggested).  If this happens, there is a potential race where process_event is in progress, someone cancels the event (without the lock) and deletes the Continuation resulting in a crash.

However, this patch does reduce the change of a race condition in the debug build.   The core problem is that the macro MUTEX_LOCK_FOR is accessing HANDLER_NAME
which dereferences the continuation.  This is bad.  The macro needs to be fixed to delay deferencing the Continuation until the lock is taken.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Leif Hedstrom (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164103#comment-13164103 ] 

Leif Hedstrom commented on TS-937:
----------------------------------

I saw a bunch of TS_HAS_PURIFY while doing all the memory allocation changes. Should we just nuke that sucker, now that we have jemalloc / tcmalloc support ? I have no idea if the "purify" stuff actually works?

Although in this case, no idea why we'd make a special case here for purify...
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198530#comment-13198530 ] 

Brian Geffon commented on TS-937:
---------------------------------

nevermind, I lied. Reopening :/
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Leif Hedstrom (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166718#comment-13166718 ] 

Leif Hedstrom commented on TS-937:
----------------------------------

Sold.

I did add a --disable-freelist  configure option a while ago, which turns the freelist into malloc/free calls (I hope at least, unless I fucked it up :). The thought was that we'd use this option for memory debugging either with valgrind, or e.g. tcmalloc.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled thread

Posted by "Brian Geffon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093235#comment-13093235 ] 

Brian Geffon commented on TS-937:
---------------------------------

Patch included.

> EThread::execute still processing cancelled thread
> --------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198523#comment-13198523 ] 

Brian Geffon commented on TS-937:
---------------------------------

So I've tracked down where the event is being cancelled: 

PluginVC::process_close() line 699:

  if (core_lock_retry_event) {
    core_lock_retry_event->cancel();
    core_lock_retry_event = NULL;
  }




                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213794#comment-13213794 ] 

Brian Geffon edited comment on TS-937 at 2/22/12 5:55 PM:
----------------------------------------------------------

@Igor: short answer: it doesn't. The crasher is resolved by removing the HANDLER_NAME macro, I suppose I move the remove TS_HAS_PURIFY stuff into a new Jira ticket so that we can potentially backport the removal of HANDLER_NAME since it causes segfaults in debug builds. I'll do that and commit all this stuff tonight.
                
      was (Author: briang):
    @Igor: short answer: it doesn't. The crasher is resolved by removing the HANDLER_NAME macro, I suppose I make the remove TS_HAS_PURIFY stuff into a new Jira ticket so that we can potentially backport the removal of HANDLER_NAME since it causes segfaults in debug builds. I'll do that and commit all this stuff tonight.
                  
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch, ts937.full.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> {noformat}
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213794#comment-13213794 ] 

Brian Geffon commented on TS-937:
---------------------------------

@Igor: short answer: it doesn't. The crasher is resolved by removing the HANDLER_NAME macro, I suppose I make the remove TS_HAS_PURIFY stuff into a new Jira ticket so that we can potentially backport the removal of HANDLER_NAME since it causes segfaults in debug builds. I'll do that and commit all this stuff tonight.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch, ts937.full.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> {noformat}
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon updated TS-937:
----------------------------

    Comment: was deleted

(was: nevermind, I lied. Reopening :/)
    
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164911#comment-13164911 ] 

Brian Geffon commented on TS-937:
---------------------------------

Ignore notifications about a comment talking about inclusion in 3.0.2, I was mistaken.

                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "weijin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095430#comment-13095430 ] 

weijin commented on TS-937:
---------------------------

The main reason that EThread processing the canceled event is because the canceled action is performed in other thread,so I do not think this patch can solve the problem completely. 

> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.1
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon resolved TS-937.
-----------------------------

    Resolution: Fixed

Fixed in commit cd6eb8f62272aad10bd3ac49bd4c6a20d36b566f
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch, ts937.full.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> {noformat}
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Igor Galić (Commented JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213604#comment-13213604 ] 

Igor Galić commented on TS-937:
-------------------------------

ad ts937.full.patch: Now that I see them all in one place: How did this ever make sense?
Looks good, +1

What I don't get is what the crasher has to do with the PURIFY cleanup.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch, ts937.full.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> {noformat}
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Alan M. Carroll (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan M. Carroll updated TS-937:
-------------------------------

    Fix Version/s:     (was: 3.1.2)
                   3.1.3
    
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214044#comment-13214044 ] 

Brian Geffon commented on TS-937:
---------------------------------

I've created a new ticket TS-1117 (Remove TS_HAS_PURIFY) and I'll close this as a duplicate of that since removing TS_HAS_PURIFY indirectly fixes this bug as it also removes all usage of HANDLER_NAME.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch, ts937.full.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> {noformat}
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163157#comment-13163157 ] 

John Plevyak commented on TS-937:
---------------------------------

On a side note, this macro should not do the deref on non-debug builds.... but for some reason it looks like the case is TS_HAS_PURIFY.... what the heck is that all about?
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Leif Hedstrom (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199504#comment-13199504 ] 

Leif Hedstrom commented on TS-937:
----------------------------------

Brian, should we keep this for 3.1.2, or move out to 3.1.3 ?
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199506#comment-13199506 ] 

Brian Geffon commented on TS-937:
---------------------------------

I'll move it out to 3.1.3, I'm preparing my fix. Basically it's going to involve 1) blowing away the HANDLER_NAME macro for a quick fix, 2) Identifying why PluginVC is canceling an action without holding the lock, and 3) blowing away the TS_HAS_PURIFY crap, which is not a big deal but affects hundreds of files.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.3
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (TS-937) EThread::execute still processing cancelled event

Posted by "Leif Hedstrom (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom reassigned TS-937:
--------------------------------

    Assignee: Brian Geffon
    
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon updated TS-937:
----------------------------

    Summary: EThread::execute still processing cancelled event  (was: EThread::execute still processing cancelled thread)

> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198526#comment-13198526 ] 

Brian Geffon commented on TS-937:
---------------------------------

So it turns out that this bug is fixed in TS-1074, I've verified this and am closing this bug.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon reopened TS-937:
-----------------------------

    
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>            Assignee: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re:[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by taorui <we...@126.com>.
hmm, why the check of event cancel in process_event did not include the timeout event?





At 2011-09-06 08:07:09,"Brian Geffon (JIRA)" <ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097644#comment-13097644 ] 
>
>Brian Geffon commented on TS-937:
>---------------------------------
>
>Thanks for the response weijin. Perhaps I'm missing something but there is currently a check for the event being cancelled in ProcessEvent for all events that do not have a timeout, if the event has some timeout on it then there won't be a check, why would it be safe to put the cancel check in ProcessEvent in that situation?
>
>[http://svn.apache.org/viewvc/trafficserver/traffic/trunk/iocore/eventsystem/UnixEThread.cc?view=markup#l234]
>
>
>
>> EThread::execute still processing cancelled event
>> -------------------------------------------------
>>
>>                 Key: TS-937
>>                 URL: https://issues.apache.org/jira/browse/TS-937
>>             Project: Traffic Server
>>          Issue Type: Bug
>>          Components: Core
>>    Affects Versions: 3.0.1, 2.1.9
>>         Environment: RHEL6
>>            Reporter: Brian Geffon
>>             Fix For: 3.1.1
>>
>>         Attachments: UnixEThread.patch
>>
>>
>> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
>> Brian
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
>> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
>> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
>> (gdb) bt
>> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
>> (gdb) bt full
>> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
>> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>>         done_one = false
>>         e = 0x1db45c0
>>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>>         next_time = 1314647904419648000
>> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>>         p = 0xfb7e80
>> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
>> No symbol table info available.
>> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
>> No symbol table info available.
>> (gdb) f 0
>> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
>> (gdb) p *e
>> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}
>
>--
>This message is automatically generated by JIRA.
>For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>        

[jira] [Commented] (TS-937) EThread::execute still processing cancelled event

Posted by "Brian Geffon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097644#comment-13097644 ] 

Brian Geffon commented on TS-937:
---------------------------------

Thanks for the response weijin. Perhaps I'm missing something but there is currently a check for the event being cancelled in ProcessEvent for all events that do not have a timeout, if the event has some timeout on it then there won't be a check, why would it be safe to put the cancel check in ProcessEvent in that situation?

[http://svn.apache.org/viewvc/trafficserver/traffic/trunk/iocore/eventsystem/UnixEThread.cc?view=markup#l234]



> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.1
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira