You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2008/02/13 01:47:40 UTC
DO NOT REPLY [Bug 44402] New: - Worker mpm crashes (SEGV) under stress with static workload on 64 bit solaris x86
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
Summary: Worker mpm crashes (SEGV) under stress with static
workload on 64 bit solaris x86
Product: Apache httpd-2
Version: 2.2.6
Platform: Sun
OS/Version: Solaris
Status: NEW
Severity: blocker
Priority: P2
Component: worker
AssignedTo: bugs@httpd.apache.org
ReportedBy: basant.kukreja@sun.com
I am running specweb99 static content workload with httpd 2.2.6 on Solaris
nevada (snv_79). I am seeing several crashes. Typically crash do reproduce in
10 minutes. Here are the details :
Apache version : httpd-2.2.6
Simultaneous connection : 1000
Hardware : X4100 Server (4 core 2.8 GHz)
CPU : Only single core is enabled
Architecture : x86_64
httpd.conf contains :
<IfModule worker.c>
ListenBackLog 50000
StartServers 2
ThreadLimit 500
ThreadsPerChild 500
MinSpareThreads 100
MaxSpareThreads 100
ThreadsPerChild 500
MaxClients 1000
MaxRequestsPerChild 0
</IfModule>
Listen 192.168.21.1:80
Listen 192.168.22.1:80
Here is the most common stack trace.
Configure option :
CFLAGS="-g -mt -m64 -KPIC " ./configure --prefix=<prefix_path> --with-mpm=worker
--enable-modules=all --with-ssl=/usr/sfw --enable-mods-shared=all --enable-cgi
--enable-threads && gmake && gmake install
Crash 1 :
(dbx) where
current thread: t@76
=>[1] allocator_free(allocator = 0x101f870, node = (nil)), line 331 in "apr_pools.c"
[2] apr_pool_clear(pool = 0x102fb88), line 710 in "apr_pools.c"
[3] ap_core_output_filter(f = 0x1020550, b = 0x101f9e8), line 899 in
"core_filters.c"
[4] ap_pass_brigade(next = 0x1020550, bb = 0x101f9e8), line 526 in "util_filter.c"
[5] logio_out_filter(f = 0x10204e0, bb = 0x101f9e8), line 135 in "mod_logio.c"
[6] ap_pass_brigade(next = 0x10204e0, bb = 0x101f9e8), line 526 in "util_filter.c"
[7] ap_flush_conn(c = 0x101fd00), line 84 in "connection.c"
[8] ap_lingering_close(c = 0x101fd00), line 123 in "connection.c"
[9] process_socket(p = 0x101f968, sock = 0x101f9e8, my_child_num = 1,
my_thread_num = 227, bucket_alloc = 0x1029a88), line 545 in "worker.c"
[10] worker_thread(thd = 0x5bed38, dummy = 0x6dbac0), line 894 in "worker.c"
[11] dummy_worker(opaque = 0x5bed38), line 142 in "thread.c"
[12] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
[13] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0
Crash 2 :
(dbx) where
current thread: t@363
=>[1] apr_palloc(pool = 0x21680007952225ff, size = 18446744073323675656U), line
601 in "apr_pools.c"
[2] apr_sockaddr_ip_get(addr = 0xcda3d0, sockaddr = 0x42d790), line 104 in
"sockaddr.c"
[3] core_create_conn(ptrans = 0xcda2d8, server = 0x4bf600, csd = 0xcda358, id
= 360, sbh = 0xcda378, alloc = 0xd147e8), line 3895 in "core.c"
[4] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x45fe03
[5] process_socket(p = 0xcda2d8, sock = 0xcda358, my_child_num = 0,
my_thread_num = 360, bucket_alloc = 0xd147e8), line 542 in "worker.c"
[6] worker_thread(thd = 0x7192f8, dummy = 0x7e45a0), line 894 in "worker.c"
[7] dummy_worker(opaque = 0x7192f8), line 142 in "thread.c"
[8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
[9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0
I tried httpd-2.2.8 too but got the similar crash :
Crash 3 (with httpd-2.2.8):
=>[1] apr_palloc(pool = 0x226800079e7a25ff, size = 18446744073323675656U), line
630 in "apr_pools.c"
[2] apr_sockaddr_ip_get(addr = 0xc57060, sockaddr = 0x42dab8), line 104 in
"sockaddr.c"
[3] core_create_conn(ptrans = 0xc56f68, server = 0x4c0378, csd = 0xc56fe8, id
= 951, sbh = 0xc57008, alloc = 0xc58f78), line 3895 in "core.c"
[4] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x4604e3
[5] process_socket(p = 0xc56f68, sock = 0xc56fe8, my_child_num = 1,
my_thread_num = 451, bucket_alloc = 0xc58f78), line 542 in "worker.c"
[6] worker_thread(thd = 0x870c88, dummy = 0x7e7e30), line 894 in "worker.c"
[7] dummy_worker(opaque = 0x870c88), line 142 in "thread.c"
[8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
[9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0
prefork mpm works just fine.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
rpluem@apache.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |NEEDINFO
------- Additional Comments From rpluem@apache.org 2008-02-13 13:03 -------
First guess for your last crash in comment #4 (all line numbers 2.2.8):
lr->accept_func(&csd, lr, ptrans); (line 742 in worker.c) fails with
rv != APR_SUCCESS, but with a non NULL value for csd.
In contrast to the code in prefork we don't check this situation:
Lines 621 - 631 of prefork.c:
status = lr->accept_func(&csd, lr, ptrans);
SAFE_ACCEPT(accept_mutex_off()); /* unlock after "accept" */
if (status == APR_EGENERAL) {
/* resource shortage or should-not-occur occured */
clean_child_exit(1);
}
else if (status != APR_SUCCESS) {
continue;
}
Maybe we need to do a continue in the worker case as well or we need to do
something like the following:
Index: server/mpm/worker/worker.c
===================================================================
--- server/mpm/worker/worker.c (Revision 627576)
+++ server/mpm/worker/worker.c (Arbeitskopie)
@@ -743,6 +743,9 @@
/* later we trash rv and rely on csd to indicate success/failure */
AP_DEBUG_ASSERT(rv == APR_SUCCESS || !csd);
+ if (rv != APR_SUCCESS) {
+ csd = NULL;
+ }
if (rv == APR_EGENERAL) {
/* E[NM]FILE, ENOMEM, etc */
resource_shortage = 1;
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
------- Additional Comments From rpluem@apache.org 2008-02-16 14:23 -------
I think I have to correct myself in two points.
1. On APR trunk there are better implementations for apr_atomic_casptr which no
longer use a mutex, but native platform processor / OS features. So in
contrast to my first assumption there could be a performance degradation by
your patch on trunk, which would be bad.
2. The race scenario you described cannot happen in this way, because it assumes
that multiple threads pop pools from the list in parallel. This is not the
case as only the listener thread does this. What happens in parallel are:
- Multiple pushes to the list
- (Multiple) pushes to the list and a pop
OTOH I still believe that there is some kind of race scenario as your patch
showed that the error goes away if the locking / syncing is changed here.
So maybe its only a different scenario (that I haven't figured out so far) or
there is a bug in apr_atomic_casptr.
Do the same crashes happen with trunk?
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
------- Additional Comments From basant.kukreja@sun.com 2008-02-12 19:17 -------
Here is the crash from 32 bit apache :
=>[1] allocator_free(allocator = 0x8aae018, node = (nil)), line 331 in "apr_pools.c"
[2] apr_pool_clear(pool = 0x8b629b8), line 710 in "apr_pools.c"
[3] ap_core_output_filter(f = 0x8aae870, b = 0x8aae0e0), line 899 in
"core_filters.c"
[4] ap_pass_brigade(next = 0x8aae870, bb = 0x8aae0e0), line 526 in "util_filter.c"
[5] logio_out_filter(f = 0x8aae830, bb = 0x8aae0e0), line 135 in "mod_logio.c"
[6] ap_pass_brigade(next = 0x8aae830, bb = 0x8aae0e0), line 526 in "util_filter.c"
[7] ap_flush_conn(c = 0x8aae390), line 84 in "connection.c"
[8] ap_lingering_close(c = 0x8aae390), line 123 in "connection.c"
[9] process_socket(p = 0x8aae0a0, sock = 0x8aae0e0, my_child_num = 1,
my_thread_num = 249, bucket_alloc = 0x8b5c9a0), line 545 in "worker.c"
[10] worker_thread(thd = 0x81a6788, dummy = 0x831f5a0), line 894 in "worker.c"
[11] dummy_worker(opaque = 0x81a6788), line 142 in "thread.c"
[12] _thr_setup(0xf004d200), at 0xfec6f282
[13] _lwp_start(0xfee7ddb9, 0xfee7f55e, 0xfffffff6, 0x0, 0x1, 0xfeea1984), at
0xfec6f4e0
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
------- Additional Comments From basant.kukreja@sun.com 2008-02-14 14:15 -------
Thanks Ruediger for your suggestion. I will try to explore based on your
suggestion.
Meanwhile here is the 3rd type of crash (with your patch).
t@13 (l@13) terminated by signal SEGV (Segmentation Fault)
Current function is apr_pool_cleanup_kill
2045 c = p->cleanups;
(dbx) where
current thread: t@13
=>[1] apr_pool_cleanup_kill(p = 0xa0, data = 0x195b888, cleanup_fn =
0xfffffd7fff223540 = &`libapr-1.so.0.2.11`sockets.c`socket_cleanup(void *sock)),
line 2045 in "apr_pools.c"
[2] apr_pool_cleanup_run(p = 0xa0, data = 0x195b888, cleanup_fn =
0xfffffd7fff223540 = &`libapr-1.so.0.2.11`sockets.c`socket_cleanup(void *sock)),
line 2088 in "apr_pools.c"
[3] apr_socket_close(thesocket = 0x195b888), line 149 in "sockets.c"
[4] ap_lingering_close(c = 0x17407f0), line 135 in "connection.c"
[5] process_socket(p = 0x1740748, sock = 0x195b888, my_child_num = 1,
my_thread_num = 10, bucket_alloc = 0x195b728), line 569 in "worker.c"
[6] worker_thread(thd = 0x52bb48, dummy = 0x4f3480), line 951 in "worker.c"
[7] dummy_worker(opaque = 0x52bb48), line 142 in "thread.c"
[8] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
[9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0
(dbx) up
Current function is apr_pool_cleanup_run
2088 apr_pool_cleanup_kill(p, data, cleanup_fn);
(dbx) up
Current function is apr_socket_close
149 return apr_pool_cleanup_run(thesocket->pool, thesocket, socket_cleanup);
(dbx) p *thesocket
*thesocket = {
pool = 0xa0
socketdes = 26588968
type = 0
protocol = 26588928
local_addr = 0x195b7e8
remote_addr = 0x1741158
timeout = 24383832
local_port_unknown = 4895848
local_interface_unknown = 0
remote_addr_unknown = 0
options = 0
inherit = 0
userdata = (nil)
}
(dbx) dump
thesocket = 0x195b888
(dbx) up
Current function is ap_lingering_close
135 apr_socket_close(csd);
(dbx) dump
timeup = 0
dummybuf = ""
c = 0x17407f0
nbytes = 4294967296U
csd = 0x195b888
(dbx) p *c
*c = {
pool = 0x1740748
base_server = 0x4c0300
vhost_lookup_data = (nil)
local_addr = 0x195b8d8
remote_addr = 0x195ba18
remote_ip = 0x1740f88 "192.168.22.2"
remote_host = (nil)
remote_logname = (nil)
aborted = 0
keepalive = AP_CONN_UNKNOWN
double_reverse = 0
keepalives = 0
local_ip = 0x1740f78 "192.168.22.1"
local_host = (nil)
id = 510
conn_config = 0x1740898
notes = 0x1740dd8
input_filters = 0x1740fa8
output_filters = 0x1740fd0
sbh = 0x17407e8
bucket_alloc = 0x195b728
cs = (nil)
data_in_input_filters = 0
}
(dbx) _arch_networkio.h`struct apr_socket_t*)csd <
*((struct apr_socket_t *) csd) = {
pool = 0xa0
socketdes = 26588968
type = 0
protocol = 26588928
local_addr = 0x195b7e8
remote_addr = 0x1741158
timeout = 24383832
local_port_unknown = 4895848
local_interface_unknown = 0
remote_addr_unknown = 0
options = 0
inherit = 0
userdata = (nil)
}
(dbx) etworkio.h`struct apr_socket_t*)csd->local_addr <
dbx: can't find field "local_addr" in "*(csd)"
(dbx) p ((`srclib/apr/include/arch/unix/apr_arch_networkio.h`struct apr_socke >
((struct apr_socket_t *) csd)->local_addr = 0x195b7e8
(dbx) p *((`srclib/apr/include/arch/unix/apr_arch_networkio.h`struct apr_sock >
*((struct apr_socket_t *) csd)->local_addr = {
pool = 0xa0
hostname = 0x195b728 "H^Gt^A"
servname = 0x195b700 ""
port = 0
family = 0
salen = 24383744U
ipaddr_len = 0
addr_str_len = 24383744
ipaddr_ptr = 0xfffffd7fff2e8010
next = (nil)
sa = {
sin = {
sin_family = 0
sin_port = 0
sin_addr = {
S_un = {
S_un_b = {
s_b1 = '\0'
s_b2 = '\0'
s_b3 = '\0'
s_b4 = '\0'
}
S_un_w = {
s_w1 = 0
s_w2 = 0
}
S_addr = 0
}
}
sin_zero = ""
}
sin6 = {
sin6_family = 0
sin6_port = 0
sin6_flowinfo = 0
sin6_addr = {
_S6_un = {
_S6_u8 = ""
_S6_u32 = (0, 0, 4373928U, 0)
__S6_align = 0
}
}
sin6_scope_id = 26588968U
__sin6_src_id = 0
}
sas = {
ss_family = 0
_ss_pad1 = ""
_ss_align = 0.0
_ss_pad2 = "xxB"
}
}
}
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
------- Additional Comments From basant.kukreja@sun.com 2008-02-12 19:43 -------
Here is the debug information from a crash of 32 bit apache :
t@414 (l@414) terminated by signal SEGV (Segmentation Fault)
Current function is apr_sockaddr_ip_get
104 *addr = apr_palloc(sockaddr->pool, sockaddr->addr_str_len);
(dbx) where
current thread: t@414
=>[1] apr_sockaddr_ip_get(addr = 0x974a3d0, sockaddr = (nil)), line 104 in
"sockaddr.c"
[2] core_create_conn(ptrans = 0x974a348, server = 0x80d9020, csd = 0x974a388,
id = 411, sbh = 0x974a398, alloc = 0x9788670), line 3895 in "core.c"
[3] ap_run_create_connection(0x974a348, 0x80d9020, 0x974a388, 0x19b,
0x974a398, 0x9788670), at 0x8090ae8
[4] process_socket(p = 0x974a348, sock = 0x974a388, my_child_num = 0,
my_thread_num = 411, bucket_alloc = 0x9788670), line 542 in "worker.c"
[5] worker_thread(thd = 0x83a6ff8, dummy = 0x8125e80), line 894 in "worker.c"
[6] dummy_worker(opaque = 0x83a6ff8), line 142 in "thread.c"
[7] _thr_setup(0xf008e200), at 0xfec6f282
[8] _lwp_start(0x0, 0xfee8410c, 0xe451bef8, 0xe451bef8, 0x8081a83, 0x974a3d0),
at 0xfec6f4e0
(dbx) p sockaddr
sockaddr = (nil)
(dbx) where
current thread: t@414
=>[1] apr_sockaddr_ip_get(addr = 0x974a3d0, sockaddr = (nil)), line 104 in
"sockaddr.c"
[2] core_create_conn(ptrans = 0x974a348, server = 0x80d9020, csd = 0x974a388,
id = 411, sbh = 0x974a398, alloc = 0x9788670), line 3895 in "core.c"
[3] ap_run_create_connection(0x974a348, 0x80d9020, 0x974a388, 0x19b,
0x974a398, 0x9788670), at 0x8090ae8
[4] process_socket(p = 0x974a348, sock = 0x974a388, my_child_num = 0,
my_thread_num = 411, bucket_alloc = 0x9788670), line 542 in "worker.c"
[5] worker_thread(thd = 0x83a6ff8, dummy = 0x8125e80), line 894 in "worker.c"
[6] dummy_worker(opaque = 0x83a6ff8), line 142 in "thread.c"
[7] _thr_setup(0xf008e200), at 0xfec6f282
[8] _lwp_start(0x0, 0xfee8410c, 0xe451bef8, 0xe451bef8, 0x8081a83, 0x974a3d0),
at 0xfec6f4e0
(dbx) up
Current function is core_create_conn
3895 apr_sockaddr_ip_get(&c->local_ip, c->local_addr);
(dbx) p *c
*c = {
pool = 0x974a348
base_server = (nil)
vhost_lookup_data = (nil)
local_addr = (nil)
remote_addr = (nil)
remote_ip = (nil)
remote_host = (nil)
remote_logname = (nil)
aborted = 0
keepalive = AP_CONN_UNKNOWN
double_reverse = 0
keepalives = 0
local_ip = (nil)
local_host = (nil)
id = 0
conn_config = 0x974a400
notes = 0x974a6a0
input_filters = (nil)
output_filters = (nil)
sbh = 0x974a398
bucket_alloc = (nil)
cs = (nil)
data_in_input_filters = 0
}
(dbx) dump
alloc = 0x9788670
rv = 0
ptrans = 0x974a348
server = 0x80d9020
sbh = 0x974a398
c = 0x974a3a0
id = 411
csd = 0x974a388
(dbx) _arch_networkio.h`struct apr_socket_t*)csd <
*((struct apr_socket_t *) csd) = {
pool = (nil)
socketdes = 158893680
type = -17726080
protocol = 134660748
local_addr = (nil)
remote_addr = 0x19b
timeout = 158638920LL
local_port_unknown = 0
local_interface_unknown = 0
remote_addr_unknown = 0
options = 0
inherit = 0
userdata = (nil)
}
Please let me know if any other information is need.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
------- Additional Comments From rpluem@apache.org 2008-02-16 00:25 -------
(In reply to comment #12)
> If you agree that there is a clear race condition then to correct this, I have
> following suggestion :
> (a) Use a dedicated pool for each worker thread, this will avoid any locking.
> It will perform better but may require little more memory in those situations
> when worker threads are not fully used.
> (b) Use some other technique other than a recycled pool list which avoids race
> conditions.
>
> I am in favour of option (a) until some good idea for (b) comes to my mind.
> If you agree with (a) then I can work and generate a patch.
>
> Note : Also I believe that the crash will happen in linux too. I never ran more
> than 1 hour in linux. I will try that tonight.
>
Thank you for your thorough investigation. I agree with you that we have the
described race conditions here. We have a similar race in the event MPM.
Next steps:
1. Bring your patch above into trunk. Currently I see no significant performance
loss over the current code as we are using a mutex there as well. We only
increase the time during which we lock the resource. I don't know right
now when I find the cycles to apply the patch to trunk, but if you could
attach a trunk version of your patch to this report it would be a big help.
2. Move the further discussion regarding options a) or b) to
dev@httpd.apache.org and lets wait for its results to decide how to move
along and improve the situation here in the long run.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
basant.kukreja@sun.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEEDINFO |NEW
------- Additional Comments From basant.kukreja@sun.com 2008-02-13 22:56 -------
I did the stress test with the patch you suggested. After your patch, I
still got the 1st crash. If it crashed in second stack trace then I will update
the bug.
Here are some more information about 1st crash.
* I am able to reproduce the crash on Solaris 10 update 1 (on a different
machine) too. It took around 4 hours of stress before I got the crash on
Solaris 10 while it takes around < 30 minutes to reproduce on Solaris nevada.
It was crash 1 (allocator_free with node = null) (without your patch).
Here is more information of the crash 1 :
(dbx) where
current thread: t@21
=>[1] allocator_free(allocator = 0x8afe2e0, node = (nil)), line 331 in "apr_pools.c"
[2] apr_pool_clear(pool = 0xa0d01c0), line 710 in "apr_pools.c"
[3] ap_core_output_filter(f = 0xa0b28c8, b = 0xa0b2a08), line 899 in
"core_filters.c"
[4] ap_pass_brigade(next = 0xa0b28c8, bb = 0xa0b2a08), line 526 in "util_filter.c"
[5] logio_out_filter(f = 0xa0b2888, bb = 0xa0b2a08), line 135 in "mod_logio.c"
[6] ap_pass_brigade(next = 0xa0b2888, bb = 0xa0b2a08), line 526 in "util_filter.c"
[7] ap_flush_conn(c = 0xa0b23e8), line 84 in "connection.c"
[8] ap_lingering_close(c = 0xa0b23e8), line 123 in "connection.c"
[9] process_socket(p = 0x8afe368, sock = 0x8aff660, my_child_num = 1,
my_thread_num = 18, bucket_alloc = 0xa0be178), line 545 in "worker.c"
[10] worker_thread(thd = 0x81487d8, dummy = 0x8117b30), line 894 in "worker.c"
[11] dummy_worker(opaque = 0x81487d8), line 142 in "thread.c"
[12] _thr_setup(0xfe244800), at 0xfeccf92e
[13] _lwp_start(), at 0xfeccfc10
(dbx) up
Current function is apr_pool_clear
710 allocator_free(pool->allocator, active->next);
(dbx) p *active
*active = {
next = (nil)
ref = 0xa0d01a8
index = 1U
free_index = 0
first_avail = 0xa0d01f8 "\xc0^A^M\n\xfc^A^M\n\xfc^A^M\nx\xe1^K\n"
endp = 0xa0d21a8 "^A "
}
(dbx) up
Current function is ap_core_output_filter
899 apr_pool_clear(ctx->deferred_write_pool);
(dbx) p *ctx
*ctx = {
b = (nil)
deferred_write_pool = 0xa0d01c0
}
(dbx) p *ctx->deferred_write_pool
*ctx->deferred_write_pool = {
parent = 0x8afe368
child = (nil)
sibling = 0xa0c6198
ref = 0x8afe36c
cleanups = (nil)
free_cleanups = (nil)
allocator = 0x8afe2e0
subprocesses = (nil)
abort_fn = (nil)
user_data = (nil)
tag = 0x80bfd1c "deferred_write"
active = 0xa0d01a8
self = 0xa0d01a8
self_first_avail = 0xa0d01f8 "\xc0^A^M\n\xfc^A^M\n\xfc^A^M\nx\xe1^K\n"
}
(dbx) p *c
*c = {
pool = 0x8afe368
base_server = 0x80e6bf8
vhost_lookup_data = (nil)
local_addr = 0x8aff698
remote_addr = 0x8aff7c0
remote_ip = 0xa0b2850 "192.168.11.1"
remote_host = (nil)
remote_logname = (nil)
aborted = 0
keepalive = AP_CONN_KEEPALIVE
double_reverse = 0
keepalives = 1
local_ip = 0xa0b2840 "192.168.11.2"
local_host = (nil)
id = 518
conn_config = 0xa0b2448
notes = 0xa0b26e8
input_filters = 0xa0b2870
output_filters = 0xa0b2888
sbh = 0xa0b23e0
bucket_alloc = 0xa0be178
cs = (nil)
data_in_input_filters = 0
}
One putting some printfs I figured out the following :
In apr_pool_clear (when invoked for deferred_write_pool)
...
active = pool->active = pool->self;
active->first_avail = pool->self_first_avail;
if (active->next == active)
return;
active->next should typically be s circular link list. What is happenning some
cases is that active->next points to some thing else and active->ref still
points to active->next. I put a printf of active->next before it is set to
NULL. For a particular crash, here is my debugging session. I found that
active->next
was set to 0x20e8810 before it was set to NULL.
(dbx) up
Current function is apr_pool_clear
774 allocator_free(pool->allocator, active->next);
(dbx) up
Current function is ap_core_output_filter
923 apr_pool_clear(ctx->deferred_write_pool);
(dbx) p (struct apr_memnode_t*)0x20e8810 -----> This was active->next before set
to NULL.
(struct apr_memnode_t *) 0x20e8810 = 0x20e8810
(dbx) p *(struct apr_memnode_t*)0x20e8810
*((struct apr_memnode_t *) 0x20e8810) = {
next = 0x288c5b0
ref = 0x20e8810
index = 1U
free_index = 0
first_avail = 0x20e9eb0 "GET /file_set/dir00104/class1_3 HTTP/1.0"
endp = 0x20ea810 "^A "
}
(dbx) down
Current function is apr_pool_clear
774 allocator_free(pool->allocator, active->next);
(dbx) p active
active = 0x20e27e0
(dbx) p *((struct apr_memnode_t*)0x20e8810)->next
*((struct apr_memnode_t *) 0x20e8810)->next = {
next = 0x20e07d0
ref = 0x20e07d0
index = 1U
free_index = 0
first_avail = 0x288d008 ""
endp = 0x288e5b0 "^A "
}
(dbx) p active
active = 0x20e27e0
(dbx) p *(((struct apr_memnode_t*)0x20e8810)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next = {
next = 0x28905d0
ref = 0x288c5b0
index = 1U
free_index = 0
first_avail = 0x20e2738 ""
endp = 0x20e27d0 "^A "
}
(dbx) p *((((struct apr_memnode_t*)0x20e8810)->next)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next->next = {
next = 0x288e5c0
ref = 0x28905d0
index = 1U
free_index = 0
first_avail = 0x2890668 "\xf8^E\x89^B"
endp = 0x28925d0 "^Q^P"
}
(dbx) p *(((((struct apr_memnode_t*)0x20e8810)->next)->next)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next->next->next = {
next = (nil)
ref = (nil)
index = 1U
free_index = 0
first_avail = 0x288e5e8 "`^_"
endp = 0x28905c0 "^A "
}
On further debugging, I figured out that typically ap_core_output_filter is
called 4 times for a request. The crash always happen in 4th invocation. It
seems to me that it gets corrupted somewhere after the 3rd invocation (after it
returns from ap_core_output_filter) and before it enters into
ap_core_output_filter 4th time (when ap_lingering_close is in call stack). Also
conn->keepalives was always set to 1.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
basant.kukreja@sun.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEEDINFO |NEW
------- Additional Comments From basant.kukreja@sun.com 2008-02-15 11:48 -------
Thanks Ruediger for your pointer. It was really useful.
Regarding the function : ap_queue_info_wait_for_idler (lines 188-196)
188: struct recycled_pool *first_pool = queue_info->recycled_pools;
189: if (first_pool == NULL) {
190: break;
191: }
192: if (apr_atomic_casptr((volatile
void**)&(queue_info->recycled_pools), first_pool->next,
193: first_pool) == first_pool) {
194: *recycled_pool = first_pool->pool;
195: break;
196: }
I will represent queue_info->receycled_pools as qu->rp to make it make it
short. Inside apr_atomic_casptr we acquire a mutex. So I will write 3 steps :
1. Calcualte first_pool.
2. Calculate first_pool->next and invoke apr_atomic_caspptr
3. Acquire lock on qi->rp (inside apr_atomic_casptr)
There is a very clear race condition between the two statement (line 188 and
line 193) and between step 2 & 3. Though I agree that
&queue_info->recycled_pool is protected and it is atomically correct but the
next pointer (first_pool and first_pool->next) are not protected correctly.
There is a very clear race condition between the two. To prove my point, here
is an example :
Suppose at a particular moment recycled_pool pool list is
1 --> 2 ---> 3 . Where 1,2,3 are the pool nodes. qi->rp = 1. Now consider the
following situation :
Thread 1 :
first_pool = 1;
first_pool->next = 2.
Now before step 3 is executed that is before we acquire a lock on qi->rp,
context switch happens.
Thread 2 :
Thread 2 pops a node (1) from the list and hence list becomes 2->3.
Thread 3 :
Thread 3 pops another node (2) from the list and hence list becomes 3.
Thread 2 :
push the node back and now list becomes 1->3.
Thread 1:
first->pool->next = 2. qi->rp is still 1. Thread acquires a lock &1 and
atomically compare and swap with 2. It succeeded because qi->rp was 1 but
qi->rp->next was not 3, it becomes 2 and hence queue becomes 2 (or 2-->3).
I believe, I can prove my point with a sample standalone application. So far I
used a separate mutex and protected both qi->rp and qi->rp->next both. I tried
with the attached patch. With this patch, I am able to run the stress for more
than 10 hour without any crash. Without this patch, crash used to happen in
less than 30 minutes. Here is the patch which I tried :
---------------------------------------------------------------------------
--- orghttpd-2.2.6/server/mpm/worker/fdqueue.c Wed Jul 25 06:13:49 2007
+++ httpd-2.2.6/server/mpm/worker/fdqueue.c Fri Feb 15 10:57:42 2008
@@ -25,6 +25,7 @@
struct fd_queue_info_t {
apr_uint32_t idlers;
apr_thread_mutex_t *idlers_mutex;
+ apr_thread_mutex_t *queue_mutex;
apr_thread_cond_t *wait_for_idler;
int terminated;
int max_idlers;
@@ -36,6 +37,7 @@
fd_queue_info_t *qi = data_;
apr_thread_cond_destroy(qi->wait_for_idler);
apr_thread_mutex_destroy(qi->idlers_mutex);
+ apr_thread_mutex_destroy(qi->queue_mutex);
/* Clean up any pools in the recycled list */
for (;;) {
@@ -65,6 +67,11 @@
if (rv != APR_SUCCESS) {
return rv;
}
+ rv = apr_thread_mutex_create(&qi->queue_mutex, APR_THREAD_MUTEX_DEFAULT,
+ pool);
+ if (rv != APR_SUCCESS) {
+ return rv;
+ }
rv = apr_thread_cond_create(&qi->wait_for_idler, pool);
if (rv != APR_SUCCESS) {
return rv;
@@ -93,14 +100,14 @@
new_recycle = (struct recycled_pool *)apr_palloc(pool_to_recycle,
sizeof(*new_recycle));
new_recycle->pool = pool_to_recycle;
- for (;;) {
- new_recycle->next = queue_info->recycled_pools;
- if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
- new_recycle, new_recycle->next) ==
- new_recycle->next) {
- break;
- }
- }
+ rv = apr_thread_mutex_lock(queue_info->queue_mutex);
+ if (rv != APR_SUCCESS)
+ return rv;
+ new_recycle->next = queue_info->recycled_pools;
+ queue_info->recycled_pools = new_recycle;
+ rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
+ if (rv != APR_SUCCESS)
+ return rv;
}
/* Atomically increment the count of idle workers */
@@ -182,19 +189,18 @@
/* Atomically decrement the idle worker count */
apr_atomic_dec32(&(queue_info->idlers));
-
- /* Atomically pop a pool from the recycled list */
- for (;;) {
+ rv = apr_thread_mutex_lock(queue_info->queue_mutex);
+ if (rv != APR_SUCCESS)
+ return rv;
+ if (queue_info->recycled_pools) {
struct recycled_pool *first_pool = queue_info->recycled_pools;
- if (first_pool == NULL) {
- break;
- }
- if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
first_pool->next,
- first_pool) == first_pool) {
- *recycled_pool = first_pool->pool;
- break;
- }
+ queue_info->recycled_pools = first_pool->next;
+ *recycled_pool = first_pool->pool;
+ first_pool->next = NULL;
}
+ rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
+ if (rv != APR_SUCCESS)
+ return rv;
if (queue_info->terminated) {
return APR_EOF;
---------------------------------------------------------------------------
If you agree that there is a clear race condition then to correct this, I have
following suggestion :
(a) Use a dedicated pool for each worker thread, this will avoid any locking.
It will perform better but may require little more memory in those situations
when worker threads are not fully used.
(b) Use some other technique other than a recycled pool list which avoids race
conditions.
I am in favour of option (a) until some good idea for (b) comes to my mind.
If you agree with (a) then I can work and generate a patch.
Note : Also I believe that the crash will happen in linux too. I never ran more
than 1 hour in linux. I will try that tonight.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
------- Additional Comments From basant.kukreja@sun.com 2008-02-21 18:24 -------
Few more updates :
* Probably these crashes also exist on Linux (64 bit). But I can't say for
sure. I saw 3 crashes so far. Out of 3, I get core dump only once and stack
trace from that core dump didn't seem much sense to me so I can't say for sure
that the bug reproduces on Linux or not. (Linux is 64 bit Fedora 8 with 64 bit
apache).
On Solaris, I tried the following things :
* Replaced apr_atomic_casptr with solaris's atomic_casptr. But the result
remained the same. I still saw the crashes. This means that this may not
be the apr bug.
* If I replace apr_atomic_casptr code but keep the for loop then the crashes
disappear.
---------------------------------- ap_queue_info_set_idle-------------
if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
new_recycle, new_recycle->next) ==
new_recycle->next) {
break;
}
---------------------------------- replace with -----------------------
rv = apr_thread_mutex_lock(queue_info->queue_mutex);
if (queue_info->recycled_pools == new_recycle->next) {
queue_info->recycled_pools = new_recycle;
success = 1;
}
rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
---------------------------------- ap_queue_info_wait_for_idler --------------
if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools),
first_pool->next,
first_pool) == first_pool) {
*recycled_pool = first_pool->pool;
break;
}
---------------------------------- replace with ---------------------------
rv = apr_thread_mutex_lock(queue_info->queue_mutex);
if (queue_info->recycled_pools == first_pool) {
queue_info->recycled_pools = next;
success = 1;
}
rv = apr_thread_mutex_unlock(queue_info->queue_mutex);
----------------------------------
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
------- Additional Comments From rpluem@apache.org 2008-02-14 14:44 -------
As you wrote Solaris on Sun I suppose you mean on SPARC. Have you checked if the
crashes happen with the same Solaris version on x86? Background of the question:
ap_queue_info_wait_for_idler uses atomics whose implementation depends on the
hardware architecture. Non functional atomics could be a source for concurrency
problems under load.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
rpluem@apache.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |NEEDINFO
------- Additional Comments From rpluem@apache.org 2008-02-14 12:25 -------
I assume that the ptrans pool somehow gets corrupted. I guess it is used by two
threads in parallel which could lead to a corruption since pools as such are not
thread safe. So I think a good starting point for further investigations would be
ap_queue_info_wait_for_idler in mpm/worker/fdqueue.c
or the lines 731 - 740 in worker.c:
if (ptrans == NULL) {
/* we can't use a recycled transaction pool this time.
* create a new transaction pool */
apr_allocator_t *allocator;
apr_allocator_create(&allocator);
apr_allocator_max_free_set(allocator, ap_max_mem_free);
apr_pool_create_ex(&ptrans, pconf, NULL, allocator);
apr_allocator_owner_set(allocator, ptrans);
}
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
------- Additional Comments From basant.kukreja@sun.com 2008-02-18 11:33 -------
Regarding the example given in comments # 12, I need to correct myself. I
agree with you that the example is not valid for worker implementation because
there is single thread which pop the nodes and multiple threads which pushes
the node. ( ap_queue_info_wait_for_idler is not thread safe but it is not
called by multiple threads. It is only invoked by single listener_thread. )
I could not yet think of any race condition in which single popping thread
and several pushing thread cause recycle_pool list corruption.
I am still working on it to find the real cause of the crashes.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
------- Additional Comments From basant.kukreja@sun.com 2008-02-14 00:03 -------
With Ruediger patch, I still got the crash (#2). Here is the debug information :
t@314 (l@314) terminated by signal SEGV (Segmentation Fault)
Current function is apr_sockaddr_ip_get
104 *addr = apr_palloc(sockaddr->pool, sockaddr->addr_str_len);
(dbx) where
current thread: t@314
=>[1] apr_sockaddr_ip_get(addr = 0x1ebb4b0, sockaddr = (nil)), line 104 in
"sockaddr.c"
[2] core_create_conn(ptrans = 0x1ebb3b8, server = 0x4c0200, csd = 0x1ebb728,
id = 311, sbh = 0x1ebb458, alloc = 0x2128b58), line 3895 in "core.c"
[3] ap_run_create_connection(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x4602c3
[4] process_socket(p = 0x1ebb3b8, sock = 0x1ebb728, my_child_num = 0,
my_thread_num = 311, bucket_alloc = 0x2128b58), line 566 in "worker.c"
[5] worker_thread(thd = 0x7195c8, dummy = 0x6e2310), line 923 in "worker.c"
[6] dummy_worker(opaque = 0x7195c8), line 142 in "thread.c"
[7] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5d8f7
[8] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef5dba0
(dbx) p *addr
*addr = (nil)
(dbx) up
Current function is core_create_conn
3895 apr_sockaddr_ip_get(&c->local_ip, c->local_addr);
(dbx) p *c
*c = {
pool = 0x1ebb3b8
base_server = (nil)
vhost_lookup_data = (nil)
local_addr = (nil)
remote_addr = (nil)
remote_ip = (nil)
remote_host = (nil)
remote_logname = (nil)
aborted = 0
keepalive = AP_CONN_UNKNOWN
double_reverse = 0
keepalives = 0
local_ip = (nil)
local_host = (nil)
id = 0
conn_config = 0x1ebb508
notes = 0x1ebba48
input_filters = (nil)
output_filters = (nil)
sbh = 0x1ebb458
bucket_alloc = (nil)
cs = (nil)
data_in_input_filters = 0
}
(dbx) dump
alloc = 0x2128b58
rv = 0
ptrans = 0x1ebb3b8
server = 0x4c0200
sbh = 0x1ebb458
c = 0x1ebb460
id = 311
csd = 0x1ebb728
(dbx) p csd
csd = 0x1ebb728
(dbx) p *(struct apr_socket_t*) csd
*((struct apr_socket_t *) csd) = {
pool = (nil)
socketdes = 0
type = 0
protocol = 0
local_addr = (nil)
remote_addr = (nil)
timeout = 0
local_port_unknown = 0
local_interface_unknown = 0
remote_addr_unknown = 0
options = 0
inherit = 0
userdata = (nil)
}
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on 64 bit solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
------- Additional Comments From basant.kukreja@sun.com 2008-02-12 17:14 -------
I tried to debug the crash
=>[1] allocator_free(allocator = 0x101f870, node = (nil)), line 331 in "apr_pools.c"
[2] apr_pool_clear(pool = 0x102fb88), line 710 in "apr_pools.c"
[3] ap_core_output_filter(f = 0x1020550, b = 0x101f9e8), line 899 in
"core_filters.c"
In ap_core_output_filter, crash is happening when apr_pool_clear is called for
deferred_write_pool.
apr_pool_clear(ctx->deferred_write_pool);
On further investigation, I found that for ctx->deferred_write_pool, pool->ref
points to pool->next i.e
pool->ref == &pool->next
Thus in apr_pool_clear :
if (active->next == active)
return;
*active->ref = NULL; // ---> this cause active->next to set to NULL because
// active->ref points to active->next
allocator_free(pool->allocator, active->next);
The situation doesn't arrive on normal single connection situation. This
happens only under stress. Under normal connection active->next == active and
function returns from apr_pool_clear (when called for ctx->deferred_write_pool)
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
------- Additional Comments From rpluem@apache.org 2008-02-14 14:55 -------
(In reply to comment #10)
> As you wrote Solaris on Sun I suppose you mean on SPARC. Have you checked if the
> crashes happen with the same Solaris version on x86? Background of the question:
Oops my fault: You already said that you are using x86. Nevertheless does the
same happen on SPARC with the same Solaris version or if you compile with
--enable-nonportable-atomics=no ?
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=44402
basant.kukreja@sun.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Worker mpm crashes (SEGV) |Worker mpm crashes (SEGV)
|under stress with static |under stress with static
|workload on 64 bit solaris |workload on solaris x86
|x86 |
------- Additional Comments From basant.kukreja@sun.com 2008-02-12 19:15 -------
Crashes are happening on 32 bit apache too therefore changing the summary.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org