You are viewing a plain text version of this content. The canonical link for it is here.

Posted to bugs@httpd.apache.org by bu...@apache.org on 2008/02/13 01:47:40 UTC

DO NOT REPLY [Bug 44402] New: - Worker mpm crashes (SEGV) under stress with static workload on 64 bit solaris x86

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402

           Summary: Worker mpm crashes (SEGV) under stress with static
                    workload on 64 bit solaris x86
           Product: Apache httpd-2
           Version: 2.2.6
          Platform: Sun
        OS/Version: Solaris
            Status: NEW
          Severity: blocker
          Priority: P2
         Component: worker
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: basant.kukreja@sun.com


I am running specweb99 static content workload with httpd 2.2.6 on Solaris
nevada (snv_79).  I am seeing several crashes. Typically crash do reproduce in
10 minutes. Here are the details :
Apache version : httpd-2.2.6
Simultaneous connection : 1000
Hardware : X4100 Server (4 core 2.8 GHz)
CPU : Only single core is enabled
Architecture : x86_64


httpd.conf contains :
<IfModule worker.c>
ListenBackLog     50000
StartServers         2
ThreadLimit        500
ThreadsPerChild    500
MinSpareThreads    100
MaxSpareThreads    100
ThreadsPerChild    500
MaxClients        1000
MaxRequestsPerChild  0
</IfModule>

Listen 192.168.21.1:80
Listen 192.168.22.1:80

Here is the most common stack trace. 

Configure option :
CFLAGS="-g -mt -m64 -KPIC " ./configure --prefix=<prefix_path> --with-mpm=worker
--enable-modules=all --with-ssl=/usr/sfw --enable-mods-shared=all --enable-cgi
--enable-threads && gmake && gmake install

Crash 1 :
(dbx) where
current thread: t@76
=>[1] allocator_free(allocator = 0x101f870, node = (nil)), line 331 in "apr_pools.c"
  [2] apr_pool_clear(pool = 0x102fb88), line 710 in "apr_pools.c"
  [3] ap_core_output_filter(f = 0x1020550, b = 0x101f9e8), line 899 in
"core_filters.c"
  [4] ap_pass_brigade(next = 0x1020550, bb = 0x101f9e8), line 526 in "util_filter.c"
  [5] logio_out_filter(f = 0x10204e0, bb = 0x101f9e8), line 135 in "mod_logio.c"
  [6] ap_pass_brigade(next = 0x10204e0, bb = 0x101f9e8), line 526 in "util_filter.c"
  [7] ap_flush_conn(c = 0x101fd00), line 84 in "connection.c"
  [8] ap_lingering_close(c = 0x101fd00), line 123 in "connection.c"
  [9] process_socket(p = 0x101f968, sock = 0x101f9e8, my_child_num = 1,
my_thread_num = 227, bucket_alloc = 0x1029a88), line 545 in "worker.c"
  [10] worker_thread(thd = 0x5bed38, dummy = 0x6dbac0), line 894 in "worker.c"
  [11] dummy_worker(opaque = 0x5bed38), line 142 in "thread.c"
  [12] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
  [13] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0

Crash 2 :
(dbx) where
current thread: t@363
=>[1] apr_palloc(pool = 0x21680007952225ff, size = 18446744073323675656U), line
601 in "apr_pools.c"
  [2] apr_sockaddr_ip_get(addr = 0xcda3d0, sockaddr = 0x42d790), line 104 in
"sockaddr.c"
  [3] core_create_conn(ptrans = 0xcda2d8, server = 0x4bf600, csd = 0xcda358, id
= 360, sbh = 0xcda378, alloc = 0xd147e8), line 3895 in "core.c"
  [4] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x45fe03
  [5] process_socket(p = 0xcda2d8, sock = 0xcda358, my_child_num = 0,
my_thread_num = 360, bucket_alloc = 0xd147e8), line 542 in "worker.c"
  [6] worker_thread(thd = 0x7192f8, dummy = 0x7e45a0), line 894 in "worker.c"
  [7] dummy_worker(opaque = 0x7192f8), line 142 in "thread.c"
  [8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
  [9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0

I tried httpd-2.2.8 too but got the similar crash :
Crash 3 (with httpd-2.2.8):
=>[1] apr_palloc(pool = 0x226800079e7a25ff, size = 18446744073323675656U), line
630 in "apr_pools.c"
  [2] apr_sockaddr_ip_get(addr = 0xc57060, sockaddr = 0x42dab8), line 104 in
"sockaddr.c"
  [3] core_create_conn(ptrans = 0xc56f68, server = 0x4c0378, csd = 0xc56fe8, id
= 951, sbh = 0xc57008, alloc = 0xc58f78), line 3895 in "core.c"
  [4] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x4604e3
  [5] process_socket(p = 0xc56f68, sock = 0xc56fe8, my_child_num = 1,
my_thread_num = 451, bucket_alloc = 0xc58f78), line 542 in "worker.c"
  [6] worker_thread(thd = 0x870c88, dummy = 0x7e7e30), line 894 in "worker.c"
  [7] dummy_worker(opaque = 0x870c88), line 142 in "thread.c"
  [8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
  [9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0

prefork mpm works just fine.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402


rpluem@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO




------- Additional Comments From rpluem@apache.org  2008-02-13 13:03 -------
First guess for your last crash in comment #4 (all line numbers 2.2.8):

lr->accept_func(&csd, lr, ptrans); (line 742 in worker.c) fails with
rv != APR_SUCCESS, but with a non NULL value for csd.
In contrast to the code in prefork we don't check this situation:

Lines 621 - 631 of prefork.c:

        status = lr->accept_func(&csd, lr, ptrans);

        SAFE_ACCEPT(accept_mutex_off());      /* unlock after "accept" */

        if (status == APR_EGENERAL) {
            /* resource shortage or should-not-occur occured */
            clean_child_exit(1);
        }
        else if (status != APR_SUCCESS) {
            continue;
        }

Maybe we need to do a continue in the worker case as well or we need to do
something like the following:

Index: server/mpm/worker/worker.c
===================================================================
--- server/mpm/worker/worker.c  (Revision 627576)
+++ server/mpm/worker/worker.c  (Arbeitskopie)
@@ -743,6 +743,9 @@
             /* later we trash rv and rely on csd to indicate success/failure */
             AP_DEBUG_ASSERT(rv == APR_SUCCESS || !csd);

+            if (rv != APR_SUCCESS) {
+                csd = NULL;
+            }
             if (rv == APR_EGENERAL) {
                 /* E[NM]FILE, ENOMEM, etc */
                 resource_shortage = 1;





-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402





------- Additional Comments From rpluem@apache.org  2008-02-16 14:23 -------
I think I have to correct myself in two points.

1. On APR trunk there are better implementations for apr_atomic_casptr which no 
   longer use a mutex, but native platform processor / OS features. So in 
   contrast to my first assumption there could be a performance degradation by
   your patch on trunk, which would be bad.

2. The race scenario you described cannot happen in this way, because it assumes
   that multiple threads pop pools from the list in parallel. This is not the 
   case as only the listener thread does this. What happens in parallel are:

   - Multiple pushes to the list
   - (Multiple) pushes to the list and a pop

OTOH I still believe that there is some kind of race scenario as your patch
showed that the error goes away if the locking / syncing is changed here.
So maybe its only a different scenario (that I haven't figured out so far) or
there is a bug in apr_atomic_casptr.
Do the same crashes happen with trunk?
   

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402





------- Additional Comments From basant.kukreja@sun.com  2008-02-12 19:17 -------
Here is the crash from 32 bit apache :
=>[1] allocator_free(allocator = 0x8aae018, node = (nil)), line 331 in "apr_pools.c"
  [2] apr_pool_clear(pool = 0x8b629b8), line 710 in "apr_pools.c"
  [3] ap_core_output_filter(f = 0x8aae870, b = 0x8aae0e0), line 899 in
"core_filters.c"
  [4] ap_pass_brigade(next = 0x8aae870, bb = 0x8aae0e0), line 526 in "util_filter.c"
  [5] logio_out_filter(f = 0x8aae830, bb = 0x8aae0e0), line 135 in "mod_logio.c"
  [6] ap_pass_brigade(next = 0x8aae830, bb = 0x8aae0e0), line 526 in "util_filter.c"
  [7] ap_flush_conn(c = 0x8aae390), line 84 in "connection.c"
  [8] ap_lingering_close(c = 0x8aae390), line 123 in "connection.c"
  [9] process_socket(p = 0x8aae0a0, sock = 0x8aae0e0, my_child_num = 1,
my_thread_num = 249, bucket_alloc = 0x8b5c9a0), line 545 in "worker.c"
  [10] worker_thread(thd = 0x81a6788, dummy = 0x831f5a0), line 894 in "worker.c"
  [11] dummy_worker(opaque = 0x81a6788), line 142 in "thread.c"
  [12] _thr_setup(0xf004d200), at 0xfec6f282
  [13] _lwp_start(0xfee7ddb9, 0xfee7f55e, 0xfffffff6, 0x0, 0x1, 0xfeea1984), at
0xfec6f4e0


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402





------- Additional Comments From basant.kukreja@sun.com  2008-02-14 14:15 -------
Thanks Ruediger for your suggestion. I will try to explore based on your
suggestion.

Meanwhile here is the 3rd type of crash (with your patch).

t@13 (l@13) terminated by signal SEGV (Segmentation Fault)
Current function is apr_pool_cleanup_kill
 2045       c = p->cleanups;
(dbx) where
current thread: t@13
=>[1] apr_pool_cleanup_kill(p = 0xa0, data = 0x195b888, cleanup_fn =
0xfffffd7fff223540 = &`libapr-1.so.0.2.11`sockets.c`socket_cleanup(void *sock)),
line 2045 in "apr_pools.c"
  [2] apr_pool_cleanup_run(p = 0xa0, data = 0x195b888, cleanup_fn =
0xfffffd7fff223540 = &`libapr-1.so.0.2.11`sockets.c`socket_cleanup(void *sock)),
line 2088 in "apr_pools.c"
  [3] apr_socket_close(thesocket = 0x195b888), line 149 in "sockets.c"
  [4] ap_lingering_close(c = 0x17407f0), line 135 in "connection.c"
  [5] process_socket(p = 0x1740748, sock = 0x195b888, my_child_num = 1,
my_thread_num = 10, bucket_alloc = 0x195b728), line 569 in "worker.c"
  [6] worker_thread(thd = 0x52bb48, dummy = 0x4f3480), line 951 in "worker.c"
  [7] dummy_worker(opaque = 0x52bb48), line 142 in "thread.c"
  [8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
  [9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0
(dbx) up
Current function is apr_pool_cleanup_run
 2088       apr_pool_cleanup_kill(p, data, cleanup_fn);
(dbx) up
Current function is apr_socket_close
  149       return apr_pool_cleanup_run(thesocket->pool, thesocket, socket_cleanup);
(dbx) p *thesocket
*thesocket = {
    pool                    = 0xa0
    socketdes               = 26588968
    type                    = 0
    protocol                = 26588928
    local_addr              = 0x195b7e8
    remote_addr             = 0x1741158
    timeout                 = 24383832
    local_port_unknown      = 4895848
    local_interface_unknown = 0
    remote_addr_unknown     = 0
    options                 = 0
    inherit                 = 0
    userdata                = (nil)
}
(dbx) dump
thesocket = 0x195b888
(dbx) up
Current function is ap_lingering_close
  135           apr_socket_close(csd);
(dbx) dump
timeup = 0
dummybuf = ""
c = 0x17407f0
nbytes = 4294967296U
csd = 0x195b888
(dbx) p *c
*c = {
    pool                  = 0x1740748
    base_server           = 0x4c0300
    vhost_lookup_data     = (nil)
    local_addr            = 0x195b8d8
    remote_addr           = 0x195ba18
    remote_ip             = 0x1740f88 "192.168.22.2"
    remote_host           = (nil)
    remote_logname        = (nil)
    aborted               = 0
    keepalive             = AP_CONN_UNKNOWN
    double_reverse        = 0
    keepalives            = 0
    local_ip              = 0x1740f78 "192.168.22.1"
    local_host            = (nil)
    id                    = 510
    conn_config           = 0x1740898
    notes                 = 0x1740dd8
    input_filters         = 0x1740fa8
    output_filters        = 0x1740fd0
    sbh                   = 0x17407e8
    bucket_alloc          = 0x195b728
    cs                    = (nil)
    data_in_input_filters = 0
}
(dbx) _arch_networkio.h`struct apr_socket_t*)csd                              <
*((struct apr_socket_t *) csd) = {
    pool                    = 0xa0
    socketdes               = 26588968
    type                    = 0
    protocol                = 26588928
    local_addr              = 0x195b7e8
    remote_addr             = 0x1741158
    timeout                 = 24383832
    local_port_unknown      = 4895848
    local_interface_unknown = 0
    remote_addr_unknown     = 0
    options                 = 0
    inherit                 = 0
    userdata                = (nil)
}
(dbx) etworkio.h`struct apr_socket_t*)csd->local_addr                         <
dbx: can't find field "local_addr" in "*(csd)"
(dbx) p ((`srclib/apr/include/arch/unix/apr_arch_networkio.h`struct apr_socke >
((struct apr_socket_t *) csd)->local_addr = 0x195b7e8
(dbx) p *((`srclib/apr/include/arch/unix/apr_arch_networkio.h`struct apr_sock >
*((struct apr_socket_t *) csd)->local_addr = {
    pool         = 0xa0
    hostname     = 0x195b728 "H^Gt^A"
    servname     = 0x195b700 ""
    port         = 0
    family       = 0
    salen        = 24383744U
    ipaddr_len   = 0
    addr_str_len = 24383744
    ipaddr_ptr   = 0xfffffd7fff2e8010
    next         = (nil)
    sa           = {
        sin  = {
            sin_family = 0
            sin_port   = 0
            sin_addr   = {
                S_un = {
                    S_un_b = {
                        s_b1 = '\0'
                        s_b2 = '\0'
                        s_b3 = '\0'
                        s_b4 = '\0'
                    }
                    S_un_w = {
                        s_w1 = 0
                        s_w2 = 0
                    }
                    S_addr = 0
                }
            }
            sin_zero   = ""
        }
        sin6 = {
            sin6_family   = 0
            sin6_port     = 0
            sin6_flowinfo = 0
            sin6_addr     = {
                _S6_un = {
                    _S6_u8     = ""
                    _S6_u32    = (0, 0, 4373928U, 0)
                    __S6_align = 0
                }
            }
            sin6_scope_id = 26588968U
            __sin6_src_id = 0
        }
        sas  = {
            ss_family = 0
            _ss_pad1  = ""
            _ss_align = 0.0
            _ss_pad2  = "xxB"
        }
    }
}


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402





------- Additional Comments From basant.kukreja@sun.com  2008-02-12 19:43 -------
Here is the debug information from a crash of 32 bit apache :

t@414 (l@414) terminated by signal SEGV (Segmentation Fault)
Current function is apr_sockaddr_ip_get
  104       *addr = apr_palloc(sockaddr->pool, sockaddr->addr_str_len);
(dbx) where
current thread: t@414
=>[1] apr_sockaddr_ip_get(addr = 0x974a3d0, sockaddr = (nil)), line 104 in
"sockaddr.c"
  [2] core_create_conn(ptrans = 0x974a348, server = 0x80d9020, csd = 0x974a388,
id = 411, sbh = 0x974a398, alloc = 0x9788670), line 3895 in "core.c"
  [3] ap_run_create_connection(0x974a348, 0x80d9020, 0x974a388, 0x19b,
0x974a398, 0x9788670), at 0x8090ae8
  [4] process_socket(p = 0x974a348, sock = 0x974a388, my_child_num = 0,
my_thread_num = 411, bucket_alloc = 0x9788670), line 542 in "worker.c"
  [5] worker_thread(thd = 0x83a6ff8, dummy = 0x8125e80), line 894 in "worker.c"
  [6] dummy_worker(opaque = 0x83a6ff8), line 142 in "thread.c"
  [7] _thr_setup(0xf008e200), at 0xfec6f282
  [8] _lwp_start(0x0, 0xfee8410c, 0xe451bef8, 0xe451bef8, 0x8081a83, 0x974a3d0),
at 0xfec6f4e0
(dbx) p sockaddr
sockaddr = (nil)
(dbx) where
current thread: t@414
=>[1] apr_sockaddr_ip_get(addr = 0x974a3d0, sockaddr = (nil)), line 104 in
"sockaddr.c"
  [2] core_create_conn(ptrans = 0x974a348, server = 0x80d9020, csd = 0x974a388,
id = 411, sbh = 0x974a398, alloc = 0x9788670), line 3895 in "core.c"
  [3] ap_run_create_connection(0x974a348, 0x80d9020, 0x974a388, 0x19b,
0x974a398, 0x9788670), at 0x8090ae8
  [4] process_socket(p = 0x974a348, sock = 0x974a388, my_child_num = 0,
my_thread_num = 411, bucket_alloc = 0x9788670), line 542 in "worker.c"
  [5] worker_thread(thd = 0x83a6ff8, dummy = 0x8125e80), line 894 in "worker.c"
  [6] dummy_worker(opaque = 0x83a6ff8), line 142 in "thread.c"
  [7] _thr_setup(0xf008e200), at 0xfec6f282
  [8] _lwp_start(0x0, 0xfee8410c, 0xe451bef8, 0xe451bef8, 0x8081a83, 0x974a3d0),
at 0xfec6f4e0
(dbx) up
Current function is core_create_conn
 3895       apr_sockaddr_ip_get(&c->local_ip, c->local_addr);
(dbx) p *c
*c = {
    pool                  = 0x974a348
    base_server           = (nil)
    vhost_lookup_data     = (nil)
    local_addr            = (nil)
    remote_addr           = (nil)
    remote_ip             = (nil)
    remote_host           = (nil)
    remote_logname        = (nil)
    aborted               = 0
    keepalive             = AP_CONN_UNKNOWN
    double_reverse        = 0
    keepalives            = 0
    local_ip              = (nil)
    local_host            = (nil)
    id                    = 0
    conn_config           = 0x974a400
    notes                 = 0x974a6a0
    input_filters         = (nil)
    output_filters        = (nil)
    sbh                   = 0x974a398
    bucket_alloc          = (nil)
    cs                    = (nil)
    data_in_input_filters = 0
}
(dbx) dump
alloc = 0x9788670
rv = 0
ptrans = 0x974a348
server = 0x80d9020
sbh = 0x974a398
c = 0x974a3a0
id = 411
csd = 0x974a388
(dbx) _arch_networkio.h`struct apr_socket_t*)csd                              <
*((struct apr_socket_t *) csd) = {
    pool                    = (nil)
    socketdes               = 158893680
    type                    = -17726080
    protocol                = 134660748
    local_addr              = (nil)
    remote_addr             = 0x19b
    timeout                 = 158638920LL
    local_port_unknown      = 0
    local_interface_unknown = 0
    remote_addr_unknown     = 0
    options                 = 0
    inherit                 = 0
    userdata                = (nil)
}

Please let me know if any other information is need.


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402





------- Additional Comments From rpluem@apache.org  2008-02-16 00:25 -------
(In reply to comment #12)

> If you agree that there is a clear race condition then to correct this, I have
> following suggestion :
> (a) Use a dedicated pool for each worker thread, this will avoid any locking.
> It will perform better but may require little more memory in those situations
> when worker threads are not fully used.
> (b) Use some other technique other than a recycled pool list which avoids race
> conditions.
> 
> I am in favour of option (a) until some good idea for (b) comes to my mind.
> If you agree with (a) then I can work and generate a patch.
> 
> Note : Also I believe that the crash will happen in linux too. I never ran more
> than 1 hour in linux. I will try that tonight.
> 

Thank you for your thorough investigation. I agree with you that we have the
described race conditions here. We have a similar race in the event MPM.
Next steps:

1. Bring your patch above into trunk. Currently I see no significant performance 
   loss over the current code as we are using a mutex there as well. We only
   increase the time during which we lock the resource. I don't know right
   now when I find the cycles to apply the patch to trunk, but if you could
   attach a trunk version of your patch to this report it would be a big help.

2. Move the further discussion regarding options a) or b) to 
   dev@httpd.apache.org and lets wait for its results to decide how to move 
   along and improve the situation here in the long run.
 


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402


basant.kukreja@sun.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW




------- Additional Comments From basant.kukreja@sun.com  2008-02-13 22:56 -------
I did the stress test with the patch you suggested. After your patch, I
still got the 1st crash. If it crashed in second stack trace then I will update
the bug.

Here are some more information about 1st crash.
* I am able to reproduce the crash on Solaris 10 update 1 (on a different
  machine) too. It took around 4 hours of stress before I got the crash on
  Solaris 10 while it takes around < 30 minutes to reproduce on Solaris nevada.
  It was crash 1 (allocator_free with node = null) (without your patch).

Here is more information of the crash 1 :
(dbx) where
current thread: t@21
=>[1] allocator_free(allocator = 0x8afe2e0, node = (nil)), line 331 in "apr_pools.c"
  [2] apr_pool_clear(pool = 0xa0d01c0), line 710 in "apr_pools.c"
  [3] ap_core_output_filter(f = 0xa0b28c8, b = 0xa0b2a08), line 899 in
"core_filters.c"
  [4] ap_pass_brigade(next = 0xa0b28c8, bb = 0xa0b2a08), line 526 in "util_filter.c"
  [5] logio_out_filter(f = 0xa0b2888, bb = 0xa0b2a08), line 135 in "mod_logio.c"
  [6] ap_pass_brigade(next = 0xa0b2888, bb = 0xa0b2a08), line 526 in "util_filter.c"
  [7] ap_flush_conn(c = 0xa0b23e8), line 84 in "connection.c"
  [8] ap_lingering_close(c = 0xa0b23e8), line 123 in "connection.c"
  [9] process_socket(p = 0x8afe368, sock = 0x8aff660, my_child_num = 1,
my_thread_num = 18, bucket_alloc = 0xa0be178), line 545 in "worker.c"
  [10] worker_thread(thd = 0x81487d8, dummy = 0x8117b30), line 894 in "worker.c"
  [11] dummy_worker(opaque = 0x81487d8), line 142 in "thread.c"
  [12] _thr_setup(0xfe244800), at 0xfeccf92e
  [13] _lwp_start(), at 0xfeccfc10
(dbx) up
Current function is apr_pool_clear
  710       allocator_free(pool->allocator, active->next);
(dbx) p *active
*active = {
    next        = (nil)
    ref         = 0xa0d01a8
    index       = 1U
    free_index  = 0
    first_avail = 0xa0d01f8 "\xc0^A^M\n\xfc^A^M\n\xfc^A^M\nx\xe1^K\n"
    endp        = 0xa0d21a8 "^A "
}
(dbx) up
Current function is ap_core_output_filter
  899               apr_pool_clear(ctx->deferred_write_pool);
(dbx) p *ctx
*ctx = {
    b                   = (nil)
    deferred_write_pool = 0xa0d01c0
}
(dbx) p *ctx->deferred_write_pool
*ctx->deferred_write_pool = {
    parent           = 0x8afe368
    child            = (nil)
    sibling          = 0xa0c6198
    ref              = 0x8afe36c
    cleanups         = (nil)
    free_cleanups    = (nil)
    allocator        = 0x8afe2e0
    subprocesses     = (nil)
    abort_fn         = (nil)
    user_data        = (nil)
    tag              = 0x80bfd1c "deferred_write"
    active           = 0xa0d01a8
    self             = 0xa0d01a8
    self_first_avail = 0xa0d01f8 "\xc0^A^M\n\xfc^A^M\n\xfc^A^M\nx\xe1^K\n"
}
(dbx) p *c
*c = {
    pool                  = 0x8afe368
    base_server           = 0x80e6bf8
    vhost_lookup_data     = (nil)
    local_addr            = 0x8aff698
    remote_addr           = 0x8aff7c0
    remote_ip             = 0xa0b2850 "192.168.11.1"
    remote_host           = (nil)
    remote_logname        = (nil)
    aborted               = 0
    keepalive             = AP_CONN_KEEPALIVE
    double_reverse        = 0
    keepalives            = 1
    local_ip              = 0xa0b2840 "192.168.11.2"
    local_host            = (nil)
    id                    = 518
    conn_config           = 0xa0b2448
    notes                 = 0xa0b26e8
    input_filters         = 0xa0b2870
    output_filters        = 0xa0b2888
    sbh                   = 0xa0b23e0
    bucket_alloc          = 0xa0be178
    cs                    = (nil)
    data_in_input_filters = 0
}

One putting some printfs I figured out the following :

In apr_pool_clear (when invoked for deferred_write_pool)
    ...
    active = pool->active = pool->self;
    active->first_avail = pool->self_first_avail;

    if (active->next == active)
        return;

active->next should typically be s circular link list. What is happenning some
cases is that active->next points to some thing else and active->ref still
points to active->next.  I put a printf of active->next before it is set to
NULL. For a particular crash, here is my debugging session. I found that
active->next
was set to 0x20e8810 before it was set to NULL.

(dbx) up
Current function is apr_pool_clear
  774       allocator_free(pool->allocator, active->next);
(dbx) up
Current function is ap_core_output_filter
  923               apr_pool_clear(ctx->deferred_write_pool);
(dbx) p (struct apr_memnode_t*)0x20e8810 -----> This was active->next before set
to NULL.
(struct apr_memnode_t *) 0x20e8810 = 0x20e8810
(dbx) p *(struct apr_memnode_t*)0x20e8810
*((struct apr_memnode_t *) 0x20e8810) = {
    next        = 0x288c5b0
    ref         = 0x20e8810
    index       = 1U
    free_index  = 0
    first_avail = 0x20e9eb0 "GET /file_set/dir00104/class1_3 HTTP/1.0"
    endp        = 0x20ea810 "^A "
}
(dbx) down
Current function is apr_pool_clear
  774       allocator_free(pool->allocator, active->next);
(dbx) p active
active = 0x20e27e0
(dbx) p *((struct apr_memnode_t*)0x20e8810)->next
*((struct apr_memnode_t *) 0x20e8810)->next = {
    next        = 0x20e07d0
    ref         = 0x20e07d0
    index       = 1U
    free_index  = 0
    first_avail = 0x288d008 ""
    endp        = 0x288e5b0 "^A "
}
(dbx) p active
active = 0x20e27e0
(dbx) p *(((struct apr_memnode_t*)0x20e8810)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next = {
    next        = 0x28905d0
    ref         = 0x288c5b0
    index       = 1U
    free_index  = 0
    first_avail = 0x20e2738 ""
    endp        = 0x20e27d0 "^A "
}
(dbx) p *((((struct apr_memnode_t*)0x20e8810)->next)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next->next = {
    next        = 0x288e5c0
    ref         = 0x28905d0
    index       = 1U
    free_index  = 0
    first_avail = 0x2890668 "\xf8^E\x89^B"
    endp        = 0x28925d0 "^Q^P"
}
(dbx) p *(((((struct apr_memnode_t*)0x20e8810)->next)->next)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next->next->next = {
    next        = (nil)
    ref         = (nil)
    index       = 1U
    free_index  = 0
    first_avail = 0x288e5e8 "`^_"
    endp        = 0x28905c0 "^A "
}

On further debugging, I figured out that typically ap_core_output_filter is
called 4 times for a request. The crash always happen in 4th invocation. It
seems to me that it gets corrupted somewhere after the 3rd invocation (after it
returns from ap_core_output_filter) and before it enters into
ap_core_output_filter 4th time (when ap_lingering_close is in call stack). Also
conn->keepalives was always set to 1.


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402


basant.kukreja@sun.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW




------- Additional Comments From basant.kukreja@sun.com  2008-02-15 11:48 -------
Thanks Ruediger for your pointer. It was really useful.

Regarding the function : ap_queue_info_wait_for_idler (lines 188-196)
188:        struct recycled_pool *first_pool = queue_info->recycled_pools;
189:        if (first_pool == NULL) {
190:            break;
191:        }
192:        if (apr_atomic_casptr((volatile
void**)&(queue_info->recycled_pools), first_pool->next,
193:                              first_pool) == first_pool) {
194:            *recycled_pool = first_pool->pool;
195:            break;
196:        }

I will represent queue_info->receycled_pools as qu->rp to make it make it
short.  Inside apr_atomic_casptr we acquire a mutex. So I will write 3 steps :
1. Calcualte first_pool.
2. Calculate first_pool->next and invoke apr_atomic_caspptr
3. Acquire lock on qi->rp (inside apr_atomic_casptr)

There is a very clear race condition between the two statement (line 188 and
line 193) and between step 2 & 3. Though I agree that
&queue_info->recycled_pool is protected and it is atomically correct but the
next pointer (first_pool and first_pool->next) are not protected correctly.
There is a very clear race condition between the two. To prove my point, here
is an example :

Suppose at a particular moment recycled_pool pool list is 
1 --> 2 ---> 3 . Where 1,2,3 are the pool nodes. qi->rp = 1. Now consider the
following situation :
Thread 1 :
    first_pool = 1;
    first_pool->next = 2.
    
Now before step 3 is executed that is before we acquire a lock on qi->rp,
context switch happens.

Thread 2 :
    Thread 2 pops a node (1) from the list and hence list becomes 2->3.

Thread 3 :
    Thread 3 pops another node (2) from the list and hence list becomes 3.

Thread 2 :
    push the node back and now list becomes 1->3.

Thread 1:
    first->pool->next = 2.  qi->rp is still 1.  Thread acquires a lock &1 and
atomically compare and swap with 2. It succeeded because qi->rp was 1 but
qi->rp->next was not 3, it becomes 2 and hence queue becomes 2 (or 2-->3).
        
I believe, I can prove my point with a sample standalone application. So far I
used a separate mutex and protected both qi->rp and qi->rp->next both.  I tried
with the attached patch. With this patch, I am able to run the stress for more
than 10 hour without any crash. Without this patch, crash used to happen in
less than 30 minutes. Here is the patch which I tried :
---------------------------------------------------------------------------

--- orghttpd-2.2.6/server/mpm/worker/fdqueue.c	Wed Jul 25 06:13:49 2007
+++ httpd-2.2.6/server/mpm/worker/fdqueue.c	Fri Feb 15 10:57:42 2008
@@ -25,6 +25,7 @@
 struct fd_queue_info_t {
     apr_uint32_t idlers;
     apr_thread_mutex_t *idlers_mutex;
+    apr_thread_mutex_t *queue_mutex;
     apr_thread_cond_t *wait_for_idler;
     int terminated;
     int max_idlers;
@@ -36,6 +37,7 @@
     fd_queue_info_t *qi = data_;
     apr_thread_cond_destroy(qi->wait_for_idler);
     apr_thread_mutex_destroy(qi->idlers_mutex);
+    apr_thread_mutex_destroy(qi->queue_mutex);
 
     /* Clean up any pools in the recycled list */
     for (;;) {
@@ -65,6 +67,11 @@
     if (rv != APR_SUCCESS) {
         return rv;
     }
+    rv = apr_thread_mutex_create(&qi->queue_mutex, APR_THREAD_MUTEX_DEFAULT,
+                                 pool);
+    if (rv != APR_SUCCESS) {
+        return rv;
+    }
     rv = apr_thread_cond_create(&qi->wait_for_idler, pool);
     if (rv != APR_SUCCESS) {
         return rv;
@@ -93,14 +100,14 @@
         new_recycle = (struct recycled_pool *)apr_palloc(pool_to_recycle,
                                                          sizeof(*new_recycle));
         new_recycle->pool = pool_to_recycle;
-        for (;;) {
-            new_recycle->next = queue_info->recycled_pools;
-            if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
-                                  new_recycle, new_recycle->next) ==
-                new_recycle->next) {
-                break;
-            }
-        }
+        rv = apr_thread_mutex_lock(queue_info->queue_mutex);
+        if (rv != APR_SUCCESS)
+            return rv;
+        new_recycle->next = queue_info->recycled_pools;
+        queue_info->recycled_pools = new_recycle;
+        rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
+        if (rv != APR_SUCCESS)
+            return rv;
     }
 
     /* Atomically increment the count of idle workers */
@@ -182,19 +189,18 @@
 
     /* Atomically decrement the idle worker count */
     apr_atomic_dec32(&(queue_info->idlers));
-
-    /* Atomically pop a pool from the recycled list */
-    for (;;) {
+    rv = apr_thread_mutex_lock(queue_info->queue_mutex);
+    if (rv != APR_SUCCESS)
+        return rv;
+    if (queue_info->recycled_pools) {
         struct recycled_pool *first_pool = queue_info->recycled_pools;
-        if (first_pool == NULL) {
-            break;
-        }
-        if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
first_pool->next,
-                              first_pool) == first_pool) {
-            *recycled_pool = first_pool->pool;
-            break;
-        }
+        queue_info->recycled_pools = first_pool->next;
+        *recycled_pool = first_pool->pool;
+        first_pool->next = NULL;
     }
+    rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
+    if (rv != APR_SUCCESS)
+        return rv;
 
     if (queue_info->terminated) {
         return APR_EOF;
---------------------------------------------------------------------------

If you agree that there is a clear race condition then to correct this, I have
following suggestion :
(a) Use a dedicated pool for each worker thread, this will avoid any locking.
It will perform better but may require little more memory in those situations
when worker threads are not fully used.
(b) Use some other technique other than a recycled pool list which avoids race
conditions.

I am in favour of option (a) until some good idea for (b) comes to my mind.
If you agree with (a) then I can work and generate a patch.

Note : Also I believe that the crash will happen in linux too. I never ran more
than 1 hour in linux. I will try that tonight.


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402





------- Additional Comments From basant.kukreja@sun.com  2008-02-21 18:24 -------
Few more updates : 
* Probably these crashes also exist on Linux (64 bit). But I can't say for
sure. I saw 3 crashes so far. Out of 3, I get core dump only once and stack
trace from that core dump didn't seem much sense to me so I can't say for sure
that the bug reproduces on Linux or not. (Linux is 64 bit Fedora 8 with 64 bit
apache). 

On Solaris, I tried the following things :
* Replaced apr_atomic_casptr with solaris's atomic_casptr. But the result
remained the same. I still saw the crashes. This means that this may not
be the apr bug.
* If I replace apr_atomic_casptr code but keep the for loop then  the crashes
disappear.
---------------------------------- ap_queue_info_set_idle-------------
            if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
                                  new_recycle, new_recycle->next) ==
                new_recycle->next) {
                break;
            }
---------------------------------- replace with -----------------------
            rv = apr_thread_mutex_lock(queue_info->queue_mutex);
            if (queue_info->recycled_pools == new_recycle->next) {
                queue_info->recycled_pools = new_recycle;
                success = 1;
            }
            rv = apr_thread_mutex_unlock(queue_info->queue_mutex);


---------------------------------- ap_queue_info_wait_for_idler --------------
        if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
first_pool->next,
                              first_pool) == first_pool) {
            *recycled_pool = first_pool->pool;
            break;
        }
---------------------------------- replace with ---------------------------
        rv = apr_thread_mutex_lock(queue_info->queue_mutex);
        if (queue_info->recycled_pools == first_pool) {
            queue_info->recycled_pools = next;
            success = 1;
        }
        rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
----------------------------------


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402





------- Additional Comments From rpluem@apache.org  2008-02-14 14:44 -------
As you wrote Solaris on Sun I suppose you mean on SPARC. Have you checked if the
crashes happen with the same Solaris version on x86? Background of the question:
ap_queue_info_wait_for_idler uses atomics whose implementation depends on the
hardware architecture. Non functional atomics could be a source for concurrency
problems under load.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402


rpluem@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO




------- Additional Comments From rpluem@apache.org  2008-02-14 12:25 -------
I assume that the ptrans pool somehow gets corrupted. I guess it is used by two
threads in parallel which could lead to a corruption since pools as such are not
thread safe. So I think a good starting point for further investigations would be 

ap_queue_info_wait_for_idler in mpm/worker/fdqueue.c

or the lines 731 - 740 in worker.c:

            if (ptrans == NULL) {
                /* we can't use a recycled transaction pool this time.
                 * create a new transaction pool */
                apr_allocator_t *allocator;

                apr_allocator_create(&allocator);
                apr_allocator_max_free_set(allocator, ap_max_mem_free);
                apr_pool_create_ex(&ptrans, pconf, NULL, allocator);
                apr_allocator_owner_set(allocator, ptrans);
            }



-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402





------- Additional Comments From basant.kukreja@sun.com  2008-02-18 11:33 -------
Regarding the example given in comments # 12, I need to correct myself. I
agree with you that the example is not valid for worker implementation because
there is single thread which pop the nodes and multiple threads which pushes
the node.  ( ap_queue_info_wait_for_idler is not thread safe but it is not
called by multiple threads. It is only invoked by single listener_thread. )

I could not yet think of any race condition in which single popping thread
and several pushing thread cause recycle_pool list corruption.

I am still working on it to find the real cause of the crashes.




-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402





------- Additional Comments From basant.kukreja@sun.com  2008-02-14 00:03 -------
With Ruediger patch, I still got the crash (#2). Here is the debug information :

t@314 (l@314) terminated by signal SEGV (Segmentation Fault)
Current function is apr_sockaddr_ip_get
  104       *addr = apr_palloc(sockaddr->pool, sockaddr->addr_str_len);
(dbx) where
current thread: t@314
=>[1] apr_sockaddr_ip_get(addr = 0x1ebb4b0, sockaddr = (nil)), line 104 in
"sockaddr.c"
  [2] core_create_conn(ptrans = 0x1ebb3b8, server = 0x4c0200, csd = 0x1ebb728,
id = 311, sbh = 0x1ebb458, alloc = 0x2128b58), line 3895 in "core.c"
  [3] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x4602c3
  [4] process_socket(p = 0x1ebb3b8, sock = 0x1ebb728, my_child_num = 0,
my_thread_num = 311, bucket_alloc = 0x2128b58), line 566 in "worker.c"
  [5] worker_thread(thd = 0x7195c8, dummy = 0x6e2310), line 923 in "worker.c"
  [6] dummy_worker(opaque = 0x7195c8), line 142 in "thread.c"
  [7] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
  [8] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0
(dbx) p *addr
*addr = (nil)
(dbx) up
Current function is core_create_conn
 3895       apr_sockaddr_ip_get(&c->local_ip, c->local_addr);
(dbx) p *c
*c = {
    pool                  = 0x1ebb3b8
    base_server           = (nil)
    vhost_lookup_data     = (nil)
    local_addr            = (nil)
    remote_addr           = (nil)
    remote_ip             = (nil)
    remote_host           = (nil)
    remote_logname        = (nil)
    aborted               = 0
    keepalive             = AP_CONN_UNKNOWN
    double_reverse        = 0
    keepalives            = 0
    local_ip              = (nil)
    local_host            = (nil)
    id                    = 0
    conn_config           = 0x1ebb508
    notes                 = 0x1ebba48
    input_filters         = (nil)
    output_filters        = (nil)
    sbh                   = 0x1ebb458
    bucket_alloc          = (nil)
    cs                    = (nil)
    data_in_input_filters = 0
}
(dbx) dump
alloc = 0x2128b58
rv = 0
ptrans = 0x1ebb3b8
server = 0x4c0200
sbh = 0x1ebb458
c = 0x1ebb460
id = 311
csd = 0x1ebb728
(dbx) p csd
csd = 0x1ebb728
(dbx) p *(struct apr_socket_t*) csd
*((struct apr_socket_t *) csd) = {
    pool                    = (nil)
    socketdes               = 0
    type                    = 0
    protocol                = 0
    local_addr              = (nil)
    remote_addr             = (nil)
    timeout                 = 0
    local_port_unknown      = 0
    local_interface_unknown = 0
    remote_addr_unknown     = 0
    options                 = 0
    inherit                 = 0
    userdata                = (nil)
}
 


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on 64 bit solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402





------- Additional Comments From basant.kukreja@sun.com  2008-02-12 17:14 -------
I tried to debug the crash 
=>[1] allocator_free(allocator = 0x101f870, node = (nil)), line 331 in "apr_pools.c"
  [2] apr_pool_clear(pool = 0x102fb88), line 710 in "apr_pools.c"
  [3] ap_core_output_filter(f = 0x1020550, b = 0x101f9e8), line 899 in
"core_filters.c"

In ap_core_output_filter, crash is happening when apr_pool_clear is called for
deferred_write_pool.
            apr_pool_clear(ctx->deferred_write_pool);
On further investigation, I found that for ctx->deferred_write_pool, pool->ref
points to pool->next i.e
pool->ref == &pool->next

Thus in apr_pool_clear :
    if (active->next == active)
        return;

    *active->ref = NULL; // ---> this cause active->next to set to NULL because
                         // active->ref points to active->next
    allocator_free(pool->allocator, active->next);

The situation doesn't arrive on normal single connection situation. This
happens only under stress. Under normal connection active->next == active and
function returns from apr_pool_clear (when called for ctx->deferred_write_pool)


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402





------- Additional Comments From rpluem@apache.org  2008-02-14 14:55 -------
(In reply to comment #10)
> As you wrote Solaris on Sun I suppose you mean on SPARC. Have you checked if the
> crashes happen with the same Solaris version on x86? Background of the question:

Oops my fault: You already said that you are using x86. Nevertheless does the
same happen on SPARC with the same Solaris version or if you compile with 
--enable-nonportable-atomics=no ?


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Posted by bu...@apache.org.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=44402


basant.kukreja@sun.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Worker mpm crashes (SEGV)   |Worker mpm crashes (SEGV)
                   |under stress with static    |under stress with static
                   |workload on 64 bit solaris  |workload on solaris x86
                   |x86                         |




------- Additional Comments From basant.kukreja@sun.com  2008-02-12 19:15 -------
Crashes are happening on 32 bit apache too therefore changing the summary.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org